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CROSS-REFERENCES TO RELATED APPLICATIONS 
The present application derives priority from USSN 60/148,422 filed 
5 August 1 1 , 2000, which is incorporated by reference in its entirety for all purposes. 



TECHNICAL FIELD 
The invention resides in the technical field of protein engineering. 



1 o BACKGROUND OF THE INVENTION 

Zinc finger proteins (ZFPs)are proteins that can bind to DNA in a 
sequence-specific manner. Zinc fingers were first identified in the transcription factor 
TFIIIA from the oocytes of the African clawed toad, Xenopus laevis. An exemplary 
motif characterizing one class of these protein (C2H2 class) is -Cys-(X)2-4-Cys-(X)i 2 -His- 

15 (X) 3 . 5 -His (where X is any amino acid). A single finger domain is about 30 amino acids 
in length, and several structural studies have demonstrated that it contains an alpha helix 
containing the two invariant histidine residues and two invariant cysteine residues in a 
beta turn co-ordinated through zinc. To date, over 10,000 zinc finger sequences have been 
identified in several thousand known or putative transcription factors. Zinc finger 

20 domains are involved not only in DNA-recognition, but also in RNA binding and in 

protein-protein binding. Current estimates are that this class of molecules will constitute 
about 2% of all human genes. 

The x-ray crystal structure of Zif268, a three-finger domain from a murine 
transcription factor, has been solved in complex with a cognate DNA-sequence and 

25 shows that each finger can be superimposed on the next by a periodic rotation. The 
structure suggests that each finger interacts independently with DNA over 3 base-pair 
intervals, with side-chains at positions -1, 2 , 3 and 6 on each recognition helix making 
contacts with their respective DNA triplet subsites. The amino terminus of Zif268 is 
situated at the 3' end of the DNA strand with which it makes most contacts. DNA 

30 recognition subsite. Recent results have indicated that some zinc fingers can bind to a 
fourth base in a target segment (Isalan et al., PNAS 94, 5617-5621 (1997)). If the strand 
with which a zinc finger protein makes most contacts is designated the target strand, some 
zinc finger proteins bind to a three base triplet in the target strand and a fourth base on the 
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nontarget strand. The fourth base is complementary to the base immediately 3' of the 
three base subsite. 

The structure of the Zif268-DNA complex also suggested that the DNA 
sequence specificity of a zinc finger protein might be altered by making amino acid 
5 substitutions at the four helix positions (-1 , 2, 3 and 6) on each of the zinc finger 

recognition helices. Phage display experiments using zinc finger combinatorial libraries 
to test this observation were published in a series of papers in 1994 (Rebar et al, Science 
263, 671-673 (1994); Jamieson et al., Biochemistry 33, 5689-5695 (1994); Choo et al, 
PNAS 91, 1 1 163-1 1 167 (1994)). Combinatorial libraries were constructed with 
1 0 randomized side-chains in either the first or middle finger of Zif268 and then used to 
select for an altered Zif268 binding site in which the appropriate DNA sub-site was 
replaced by an altered DNA triplet. Further, correlation between the nature of introduced 
mutations and the resulting alteration in binding specificity gave rise to a partial set of 
substitution rules for design of ZFPs with altered binding specificity. 
15 Greisman & Pabo, Science 275, 657-661 (1997) discuss an elaboration of 

the phage display method in which each finger of a Zif268 was successively randomized 
and selected for binding to a new triplet sequence. This paper reported selection of ZFPs 
for a nuclear hormone response element, a p53 target site and a TATA box sequence. 

A number of papers have reported attempts to produce ZFPs to modulate 
20 particular target sites. For example, Choo et al., Nature 372, 645 (1 994), report an 

attempt to design a ZFP that would repress expression of a brc-abl oncogene. The target 
segment to which the ZFPs would bind was a nine base sequence 5'GCA GAA3' GCC 
chosen to overlap the junction created by a specific oncogenic translocation fusing the 
genes encoding brc and abl. The intention was that a ZFP specific to this target site 
25 would bind to the oncogene without binding to abl or brc component genes. The authors 
used phage display to screen a mini-library of variant ZFPs for binding to this target 
segment. A variant ZFP thus isolated was then reported to repress expression of a stably 
transfected brc-able construct in a cell line. 

Pomerantz et al., Science 267, 93-96 (1995) reported an attempt to design 
30 a novel DNA binding protein by fusing two fingers from Zif268 with a homeodomain 
from Oct-1 . The hybrid protein was then fused with a transcriptional activator for 
expression as a chimeric protein. The chimeric protein was reported to bind a target site 
representing a hybrid of the subsites of its two components. The authors then constructed 
a reporter vector containing a luciferase gene operably linked to a promoter and a hybrid 
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site for the chimeric DNA binding protein in proximity to the promoter. The authors 
reported that their chimeric DNA binding protein could activate expression of the 
luciferase gene. 

Liu et al., PNAS 94, 5525-5530 (1997) report forming a composite zinc 
5 finger protein by using a peptide spacer to link two component zinc finger proteins each 
having three fingers. The composite protein was then further linked to transcriptional 
activation domain. It was reported that the resulting chimeric protein bound to a target 
site formed from the target segments bound by the two component zinc finger proteins. It 
was further reported that the chimeric zinc finger protein could activate transcription of a 
1 0 reporter gene when its target site was inserted into a reporter plasmid in proximity to a 
promoter operably linked to the reporter. 

Choo et al, WO 98/53058, WO98/53059, and WO 98/53060 (1998) 
discuss selection of zinc finger proteins to bind to a target site within the HIV Tat gene. 
Choo et al. also discuss selection of a zinc finger protein to bind to a target site 
1 5 encompassing a site of a common mutation in the oncogene ras. The target site within ras 
was thus constrained by the position of the mutation. 

The present application is related to copending applications 09/229,007 
filed January 12, 1999 (WO 00/42219) and 09/229,037 filed January 12, 1999 (WO 
00/41566), and both incorporated by reference in their entirety for all purposes. 

20 

SUMMARY OF THE CLAIMED INVENTION 
The invention provides nonnaturally occurring dimerizing peptides. Some 
such peptides are homo-dimerizing peptides. Such peptides typically lack significant 
sequence identity with a naturally occurring peptide. Some peptides have a length of 30 
25 amino acids or shorter. 

The invention also provides zinc finger complexes. Such a complex 
comprises a first fusion protein comprising a first zinc finger protein and a first peptide 
linker and a second fusion protein comprising a second zinc finger protein and a second 
peptide linker. The first and second fusion proteins are complexed by specific binding of 
30 the first and second peptide linkers, and the first and second peptide linkers are 

nonnaturally occurring peptides. In some complexes, the first and second peptide linkers 
are first and second copies of the same linker. 

The invention further provides methods of selecting a dimerizing peptide. 
Such methods entail providing a phage display library in which a member displays a zinc 
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finger protein fused to a peptide from its outersurface, the zinc finger protein being the 
same in different members, and the peptide varying between different members. The 
library is then contacted with a nucleic acid substrate comprising first and second binding 
sites for the zinc finger protein. Phage displaying a zinc finger protein fused to a 
5 dimerizing peptide preferentially bind to the substrate relative to phage displaying a zinc 
fusion protein fused to a nondimerizing peptide. The phage that bind to the substrate are 
isolated. A segment of the genome of a phage binding to the substrate is sequenced to 
determine the identity of a dimerizing peptide. In some such methods, the first and 
second binding sites are in opposing orientations in the substrate. In some methods, the 
1 0 phage displaying a zinc finger protein fused to the a dimerizing peptide bind to the 

substrate via display of two copies of the zinc finger protein and the dimerizing peptide, 
whereby the two copies of the zinc finger protein respectively bind to the first and second 
binding sites, and the two copies of the dimerizing peptide bind to each other. In some 
methods, the peptide is a random peptide. In some methods, the peptide is 30 amino 
1 5 acids or fewer in length. 

The invention further provides methods of regulating or detecting a target 
sequence. Such methods entail contacting the target sequence with a zinc finger 
complex, comprising a first fusion protein comprising a first zinc finger protein that 
specifically binds a segment of the target sequence and a first peptide linker and a second 
20 fusion protein comprising a second zinc finger protein that specifically binds a second 

segment of the target sequence and a second peptide linker. The first fusion protein binds 
to the first segment of the target sequence, and the second fusion protein binds to the 
second segment of the target sequence, and the first and second fusion proteins bind to 
each other via the first and second peptides. In some such methods, the target sequence is 
25 present in an intact cell. Some such methods further comprise contacting the cell with an 
expression vector encoding the first fusion protein and/or the second fusion protein, 
wherein the expression vector enters the cell and is expressed to produce the first and/or 
second fusion protein. In some methods, the target sequence is present in a patient. In 
some methods, the target sequences is present in a cell extract. 

30 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 shows a three finger zinc finger protein bound to a target site 
containing three D-able subsites. 
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Fig. 2 shows the process of assembling a nucleic acid encoding a designed 

ZFP. 

DEFINITIONS 

A zinc finger DNA binding protein is a protein or segment within a larger 
protein that binds DNA in a sequence-specific manner as a result of stabilization of 
protein structure through coordination on of zinc ion. The term zinc finger DNA binding 
protein is often abbreviated as zinc finger protein or ZFP. 

A designed zinc finger protein is a protein not occurring in nature whose 
design/composition results principally from rational criteria. Rational criteria for design 
include application of substitution rules and computerized algorithms for processing 
information in a database storing information of existing ZFP designs and binding data. 

A selected zinc finger protein is a protein not found in nature whose 
production results primarily from an empirical process such as phage display. 

The term naturally-occurring is used to describe an object that can be 
found in nature as distinct from being artificially produced by man. For example, a 
polypeptide or polynucleotide sequence that is present in an organism (including viruses) 
that can be isolated from a source in nature and which has not been intentionally modified 
by man in the laboratory is naturally-occurring. Generally, the term naturally-occurring 
refers to an object as present in a non-pathological (undiseased) individual, such as would 
be typical for the species. 

Conversely, the term nonnaturally-occurring is used to describe objects 
and sequences not found in nature. Preferred nonnaturally occurring sequences show no 
significant sequence identity, e.g., less than 50% (amino acid or nucleotide) with natural 
sequences, in distinction from induced mutations of natural sequences. Typically, 
nonnaturally occurring sequences do not contain a contiguous segment of at least half 
their length with a natural protein. Some nonnaturally occurring peptides fold in 
conformations distinct from natural peptides. Some nonnaturally occurring sequences are 
selected from random peptide libraries. 

Random peptide refers to an oligomer composed of two or more amino 
acid monomers and constructed by a means with which one does not entirely preselect the 
complete sequence of a particular oligomer. 

A random peptide library refers not only to a set of recombinant DNA 
vectors (also called recombinants) that encodes a set of random peptides, but also to the 
set of random peptides encoded by those vectors, as well as the set of fusion proteins 



containing those random peptides. Random peptide libraries frequently contain as many 
as 10 6 to 10 12 different compounds. 

A nucleic acid is operably linked when it is placed into a functional 
relationship with another nucleic acid sequence. For instance, a promoter or enhancer is 
5 operably linked to a coding sequence if it increases the transcription of the coding 
sequence. Operably linked means that the DNA sequences being linked are typically 
contiguous and, where necessary to join two protein coding regions, contiguous and in 
reading frame. However, since enhancers generally function when separated from the 
promoter by up to several kilobases or more and intronic sequences may be of variable 
1 0 lengths, some polynucleotide elements may be operably linked but not contiguous. 

A specific binding affinity between, for example, a ZFP and a specific 
target site means a binding affinity of at least 1 x 10 6 M" 1 . 

The terms "modulating expression" "inhibiting expression" and "activating 
expression" of a gene refer to the ability of a zinc finger protein to activate or inhibit 
1 5 transcription of a gene. Activation includes prevention of subsequent transcriptional 
inhibition (i.e., prevention of repression of gene expression) and inhibition includes 
prevention of subsequent transcriptional activation (i.e., prevention of gene activation). 
Modulation can be assayed by determining any parameter that is indirectly or directly 
affected by the expression of the target gene. Such parameters include, e.g., changes in 
20 RNA or protein levels, changes in protein activity, changes in product levels, changes in 
downstream gene expression, changes in reporter gene transcription (luciferase, CAT, 
beta-galactosidase, GFP (see, e.g., Mistili & Spector, Nature Biotechnology 15:961-964 
(1997)); changes in signal transduction, phosphorylation and dephosphorylation, receptor- 
ligand interactions, second messenger concentrations (e.g., cGMP, cAMP, IP3, and 
25 Ca2+), cell growth, neovascularization, in vitro, in vivo, and ex vivo. Such functional 

effects can be measured by a conventional methods, e.g., measurement of RNA or protein 
levels, measurement of RNA stability, identification of downstream or reporter gene 
expression, e.g., via chemiluminescence, fluorescence, colorimetric reactions, antibody 
binding, inducible markers, ligand binding assays; changes in intracellular second 
30 messengers such as cGMP and inositol triphosphate (IP3); changes in intracellular 
calcium levels; cytokine release, and the like. 

A "regulatory domain" refers to a protein or a protein subsequence that has 
transcriptional modulation activity. Typically, a regulatory domain is covalently or non- 
covalently linked to a ZFP to modulate transcription. Alternatively, a ZFP can act alone, 
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without a regulatory domain, or with multiple regulatory domains to modulate 
transcription. 

A D-able subsite within a target site has the motif 5'NNGK3'. A target 
site containing one or more such motifs is sometimes described as a D-able target site. A 

5 zinc finger appropriately designed to bind to a D-able subsite is sometimes referred to as 
a D-able finger. Likewise a zinc finger protein containing at least one finger designed or 
selected to bind to a target site including at least one D-able subsite is sometimes referred 
to as a D-able zinc finger protein. 

For sequence comparison and homology determination, typically one 

10 sequence acts as a reference sequence to which test sequences are compared. When using 
a sequence comparison algorithm, test and reference sequences are input into a computer, 
subsequence coordinates are designated, if necessary, and sequence algorithm program 
parameters are designated. The sequence comparison algorithm then calculates the 
percent sequence identity for the test sequence(s) relative to the reference sequence, based 

1 5 on the designated program parameters . 

Optimal alignment of sequences for comparison can be conducted, e.g., by 
the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by 
the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), 
by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 

20 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, 
FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer 
Group, 575 Science Dr., Madison, WI), or by visual inspection (see generally, Ausubel et 
al., infra). 

One example of algorithm that is suitable for determining percent 
25 sequence identity and sequence similarity is the BLAST algorithm, which is described in 
Altschul et al., J. Mol. Biol. 215:403-410 (1990). Software for performing BLAST 
analyses is publicly available through the National Center for Biotechnology Information 
(http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring 
sequence pairs (HSPs) by identifying short words of length W in the query sequence, 
30 which either match or satisfy some positive-valued threshold score T when aligned with a 
word of the same length in a database sequence. T is referred to as the neighborhood 
word score threshold (Altschul et al., supra). These initial neighborhood word hits act as 
seeds for initiating searches to find longer HSPs containing them. The word hits are then 
extended in both directions along each sequence for as far as the cumulative alignment 
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score can be increased. Cumulative scores are calculated using, for nucleotide sequences, 
the parameters M (reward score for a pair of matching residues; always > 0) and N 
(penalty score for mismatching residues; always < 0). For amino acid sequences, a 
scoring matrix is used to calculate the cumulative score. Extension of the word hits in 

5 each direction are halted when: the cumulative alignment score falls off by the quantity X 
from its maximum achieved value; the cumulative score goes to zero or below, due to the 
accumulation of one or more negative-scoring residue alignments; or the end of either 
sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) 

10 uses as defaults a wordlength (W) of 1 1 , an expectation (E) of 1 0, a cutoff of 1 00, M=5 , 
N=-4, and a comparison of both strands. For amino acid sequences, the BLASTP program 
uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 
scoring matrix (see Henikoff & Henikoff (1989) Proc. Natl. Acad. Sci. USA 89:10915). 

In addition to calculating percent sequence identity, the BLAST algorithm 

1 5 also performs a statistical analysis of the similarity between two sequences (see, e.g., 
Karlin & Altschul (1993) Proc. Nat'l. Acad. Sci. USA 90:5873-5787). One measure of 
similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), 
which provides an indication of the probability by which a match between two nucleotide 
or amino acid sequences would occur by chance. For example, a nucleic acid is 

20 considered similar to a reference sequence if the smallest sum probability in a comparison 
of the test nucleic acid to the reference nucleic acid is less than about 0.1, more preferably 
less than about 0.01, and most preferably less than about 0.001. 



DETAILED DESCRIPTION 

25 I. General 

The application provides methods for selecting dimerization peptides that 
mediate association of linked functional proteins domains. Th peptides can mediate such 
association by homodimerizing with each other, by heterodimerizing with the linked 
protein domains, or by binding to an entity, such as a DNA target site, itself bound by the 
30 linked protein domains. In particular, such peptides are useful for mediating association 
of complexes of multiple zinc finger proteins thereby affording greater specificity and/or 
affinity in binding of the zinc finger proteins to proximately spaced target segments. 

Dimerizing peptides can be selected from a phage display library among 
other methods. A phage or phagemid vector is genetically engineered so that phage 
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particles displaying a zinc finger protein fused to a peptide being screened are displayed 
from the outersuface of the phage fused to a phage coat protein. Typically, the same zinc 
finger protein is displayed from each phage. The peptide being screened varies between 
phage. Typically, the peptides constitute a random peptide library. The peptide size can 
5 vary from about 2-500 amino acids, with sizes of 8-25 amino acids being preferred. 
Typical libraries contain 10 6 -10 10 members. 

Libraries are screened by contacting the library with a nucleic acid target 
containing two binding segments for the zinc finger protein displayed by the phage. 
Typically, the two binding segments are in opposing orientations (i.e.,. the 5 '-3- strand of 

1 0 one segment is the 3'5' segment of the other, and vice versa). Although an understanding 
of mechanism is not required for practice of the invention, it is believed that phage 
displaying two copies of the zinc finger protein and linked peptide can bind to the target 
via both specific bonding of each copy of the protein to a respective target segment, and 
supporting interactions made by the peptides. Such supporting interactions can include 

1 5 contacts between the peptides and/or contacts between a peptide and another region of the 
protein (e.g., an adjacent zinc finger), and can also be stabilized by peptide-DNA 
contacts. Contacts between peptides and adjacent zinc fingers are facilitated by a 
hydrophobic batch on the surface of zinc finger The resulting arrangement in which a 
phage is effectively chelated to the target segment provides substantially stronger binding 

20 than can be mediated by phage lacking a dimerizing peptide. When the peptide-ZFP 
fusions displayed by such phage are purified, they are able to bind more tightly to the 
target sequence than the zinc finger portions alone. Accordingly, conditions of 
appropriate stringency can be devised such that phage displaying a zinc finger protein and 
dimerizing peptide can be selectively enriched and separated from other phage lacking 

25 such a peptide. 

Typically, phage surviving selection are subjected to alternate cycles of 
amplification and selection by the same assay to increase the degree of enrichment for 
phage bearing dimerizing peptides. Amplification is achieved by reinfection of host cells 
following selection. Optionally, the stringency of selection can be increased in successive 
30 rounds. Optionally, peptides encoded by phage surviving one round of selection can 
serve as kernel sequences for further mutagenesis. For example, in some methods, the 
nucleic acid encoding an isolated peptide is mutagenized such that at least one and 
sometimes, 10, 20, 33 or 50% of the amino acids are varied, and phage encoding the 
variant peptides are used as the starting materials in a subsequent round of selection. 



9 



Eventually, clonal isolates of phage surviving selection are picked. The segment of such 
phage genomes encoding the peptide moiety is sequenced to reveal the identity of the 
dimerizing peptide. 

Dimerizing peptides selected by phage display are useful for mediating 
5 multimerization of zinc finger proteins or other types of protein. A typical application of 
such peptides is to mediate association of two different zinc finger proteins that have 
proximate target segments within a target sequence. For example, each of the two zinc 
finger proteins can be a three finger protein with affinity for a 9 base target segment, and 
the respective target segments can be adjacent or within about 10 or preferably 5 

1 0 nucleotides of each other in a target sequence. Expression constructs are designed to 

express the two proteins linked to first and second dimerizing peptide sequences. In some 
applications, the dimerizing peptides are linked to opposite ends of the two zinc finger 
proteins so that the peptides are proximate to each other when the two zinc fingers are 
bound to their respective target segments. For example, if a first zinc finger protein and a 

1 5 second zinc finger protein make their primary contacts with the same strand, and the first 
zinc finger protein binds 5' relative to a second zinc finger protein on this strand, then 
typically a peptide is linked to the N terminus of the first zinc protein and the C terminus 
of the second zinc protein. Alternatively, first and second zinc finger proteins can be 
designed to bind to target segments on opposite strands of a double stranded target 

20 segment. In this situation, dimerizing peptides are included at the same terminus (either 
N or C) of the first and second zinc finger proteins. 

Each of the expressed first and second zinc finger proteins linked to a 
dimerizing peptide can then bind to its target segment. The two proteins can also bind to 
each other via the dimerizing peptides. Such binding can occur before or after the two 

25 proteins bind to their respective target segments. Associating the two proteins through 
the dimerizing peptides results in cooperative binding of the two proteins to their 
proximate target segments, thereby increasing the affinity and/or specificity of binding 
relative to the independent binding of the zinc finger proteins to their respective target 
segments. 

30 Zinc finger proteins linked to dimerizing peptides can be used in methods 

of regulating and detecting target sequences as described in more detail below. The 
binding specificity of linked zinc fingers is the aggregate of that of the component 
fingers. Linkage of two zinc finger proteins is advantageous for conferring a unique 
binding specificity within a mammalian genome. A typical mammalian diploid genome 



consists of 3 x 10 9 bp. Assuming that the four nucleotides A, C, G, and T are randomly 
distributed, a given 9 bp sequence is present -23,000 times. Thus, a ZFP recognizing a 9 
bp target with absolute specificity would have the potential to bind to -23,000 sites within 
the genome. An 18 bp sequence is present once in 3.4 x 10 10 bp, or about once in a 
5 random DNA sequence whose complexity is ten times that of a mammalian genome. 

Different zinc finger proteins can be used preassociated or can be used 
separately in which case they associated in situ. Often zinc finger proteins linked to 
dimerizing peptides of the invention remain dissociated in solution, and dimerized only 
on binding to DNA. Such is advantageous in promoting dimerization between two 

10 different zinc finger proteins linked to the dimerizing peptides relative to 

homodimerization of the two copies of the same zinc finger protein. For example, if a 
target sequence contains adjacent sites for two different zinc finger proteins, both zinc 
finger proteins can bind simultaneously to the target sequence, and then dimerize with 
each other mediated by the linked dimerizing peptide. By contrast, two copies of the 

15 same zinc finger cannot usually bind adjacent to each other on the same target sequence 
(unless by coincidence the target contains an inverted repeat of the target site for that zinc 
finger). Accordingly, multiple copies of the same zinc finger do not typically 
homodimerize with each other unless the target is designed or selected specifically so that 
such dimerization should occur. For in vivo applications, zinc finger proteins and linked 

20 dimerizing peptides are typically administered indirectly by contacting cells or organisms 
with an expression vector encoding one or more zinc finger proteins and linked 
dimerizing peptides. The expression vector is introduced into the cell and expresses the 
one or more zinc finger proteins and linked dimerizing peptides within the cell. For in 
vitro applications, such as diagnostics, associated zinc finger proteins are typically used 

25 directly in the protein form. In both in vivo and in vitro applications, use of nonnaturally 
occurring peptides to mediate dimerization offers the advantage relative to natural 
dimerizing peptides, such as fos and jun, in that nonnatural peptides are unlikely to 
crossreact with natural proteins within a cell. 
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I. Zinc Finger Proteins 

Zinc finger proteins are formed from zinc finger components. For 
example, zinc finger proteins can have one to thirty-seven fingers, commonly having 2, 3, 
4, 5 or 6 fingers. A zinc finger protein recognizes and binds to a target site (sometimes 
5 referred to as a target segment) that represents a relatively small subsequence within a 
target gene. Each component finger of a zinc finger protein can bind to a subsite within 
the target site. The subsite includes a triplet of three contiguous bases all on the same 
strand (sometimes referred to as the target strand). The subsite may or may not also 
include a fourth base on the opposite strand that is the complement of the base 

10 immediately 3' of the three contiguous bases on the target strand (see Fig. 1). In many 

zinc finger proteins, a zinc finger binds to its triplet subsite substantially independently of 
other fingers in the same zinc finger protein. Accordingly, the binding specificity of zinc 
finger protein containing multiple fingers is usually approximately the aggregate of the 
specificities of its component fingers. For example, if a zinc finger protein is formed 

15 from first, second and third fingers that individually bind to triplets XXX, YYY, and 
ZZZ, the binding specificity of the zinc finger protein is 3 'XXX YYY ZZZ5'. 

The relative order of fingers in a zinc finger protein from N-terminal to C- 
terminal determines the relative order of triplets in the 3' to 5' direction in the target. For 
example, if a zinc finger protein comprises from N-terminal to C-terminal the first, 

20 second and third fingers mentioned above, then the zinc finger protein binds to the target 
segment 3'XXXYYYZZZ5\ If the zinc finger protein comprises the fingers in another 
order, for example, second finger, first finger, third finger, then the zinc finger protein 
binds to a target segment comprising a different permutation of triplets, in this example, 
3'YYYXXXZZZ5' (see Berg & Shi, Science 271, 1081-1086 (1996)). The assessment of 

25 binding properties of a zinc finger protein as the aggregate of its component fingers is, 
however, only approximate, due to context-dependent interactions of multiple fingers 
binding in the same protein. 

Two or more zinc finger proteins can be linked either covalently or by 
dimerization to have a target specificity that is the aggregate of that of the component 

30 zinc finger proteins (see e.g., Kim & Pabo, PNAS 95, 2812-2817 (1998)). For example, a 
first zinc finger protein having first, second and third component fingers that respectively 
bind to XXX, YYY and ZZZ can be linked to a second zinc finger protein having first, 
second and third component fingers with binding specificities, AAA, BBB and CCC. The 
binding specificity of the combined first and second proteins is thus 



3'XXXYYYZZZ AAABBBCCC5', where the underline indicates a short intervening 
region (typically 0-5 hases of any type). In this situation, the target site can be viewed as 
comprising two target segments separated by an intervening segment. Linkage by 
dimerizing peptides has been discussed above. Covalent linkage can be accomplished 
5 using any of the following peptide linkers. 

T G E K P: (Liu et al., 1997, supra.); (G4S)n (Kim et al., PNAS 93, 
1156-1160 (1996.); GGRRGGGS; LRQRDGERP; LRQKDGGGSERP; LRQKD(G3S)2 
ERP. Alternatively, flexible linkers can be rationally designed using computer program 
capable of modeling both DNA-binding sites and the peptides themselves or by phage 

10 display methods . In a further variation, noncovalent linkage can be achieved by fusing 
two zinc finger proteins with domains promoting heterodimer formation of the two zinc 
finger proteins. For example, one zinc finger protein can be fused with fos and the other 
with jun (see Barbas et al., WO 95/1 19431). 

A component finger of zinc finger protein typically contains about 30 

15 amino acids and has the following motif (N-C) : 

Cys- (X) 2 _ 4 -Cys-X.X.X.X.X.X.X.X.X.X.X.X-His- (X) 3 _. 5 -His 
-11234567 
The two invariant histidine residues and two invariant cysteine residues in 
a single beta turn are co-ordinated through zinc (see, e.g., Berg & Shi, Science 271, 1081- 

20 1085 (1996)). The above motif shows a numbering convention that is standard in the 

field for the region of a zinc finger conferring binding specificity. The amino acid on the 
left (N-terminal side) of the first invariant His residues is assigned the number +6, and 
other amino acids further to the left are assigned successively decreasing numbers. The 
alpha helix begins at residue 1 and extends to the residue following the second conserved 

25 histidine. The entire helix is therefore of variable length, between 1 1 and 13 residues. 
The process of designing or selecting a nonnaturally occurring or variant ZFP typically 
starts with a natural ZFP as a source of framework residues. The process of design or 
selection serves to define nonconserved positions (i.e., positions -1 to +6) so as to confer 
a desired binding specificity. One suitable ZFP is the DNA binding domain of the mouse 

30 transcription factor Zif268. The DNA binding domain of this protein has the amino acid 
sequence: 

YACPVESCDRRFSRSDELTRHIRIHTGQKP (Fl) 
FQCRICMRNFSRSDHLTTHIRTHTGEKP (F2) 
FACDICGRKFARSDERKRHTKIHLRQK (F3) 
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and binds to a target 5 ' GCG TGG GCG 3 ' . 

Another suitable natural zinc finger protein as a source of framework 
residues is Sp-1. The Sp-1 sequence used for construction of zinc finger proteins 
corresponds to amino acids 531 to 624 in the Sp-1 transcription factor. This sequence is 
94 amino acids in length. The amino acid sequence of Sp-1 is as follows 

PGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERP 

FMCTWSYCGKRFTRSDELQRHKRTHTGEKK 

FACPECPKRFMRSDHLSKHIKTHQNKKG 

Sp-1 binds to a target site 5'GGG GCG GGG3'. 

An alternate form of Sp-1, an Sp-1 consensus sequence, has the following 
amino acid sequence: 

meklrngsgd 

PGKKKQH ACPECGKSFSKS SHLRAHQRTHTGERP 
YKCPECGKSFSRSDELQRHQRTHTGEKP 

YKCPECGKSFSRSDHLSKHQRTHQNKKG (lower case letters are a 
leader sequence from Shi & Berg, Chemistry and Biology 1, 83-89. (1995). The optimal 
binding sequence for the Sp-1 consensus sequence is 5'GGGGCGGGG3'. Other suitable 
ZFPs are described below. 

There are a number of substitution rules that assist rational design of some 
zinc finger proteins (see Desjarlais & Berg, PNAS 90, 2256-2260 (1993); Choo & Klug, 
PNAS91, 11163-11167 (1994); Desjarlais & Berg, PNAS 89, 7345-7349 (1992); 
Jamieson et al., supra; Choo et al., WO 98/53057, WO 98/53058; WO 98/53059; WO 
98/53060). Many of these rules are supported by site-directed mutagenesis of the three- 
finger domain of the ubiquitous transcription factor, Sp-1 (Desjarlais and Berg, 1992; 
1993) One of these rules is that a 5 ' G in a DNA triplet can be bound by a zinc finger 
incorporating arginine at position 6 of the recognition helix. Another substitution rule is 
that a G in the middle of a subsite can be recognized by including a histidine residue at 
position 3 of a zinc finger. A further substitution rule is that asparagine can be 
incorporated to recognize A in the middle of triplet, aspartic acid, glutamic acid, serine or 
threonine can be incorporated to recognize C in the middle of triplet, and amino acids 
with small side chains such as alanine can be incorporated to recognize T in the middle of 
triplet. A further substitution rule is that the 3' base of triplet subsite can be recognized 
by incorporating the following amino acids at position -1 of the recognition helix: 
arginine to recognize G, glutamine to recognize A, glutamic acid (or aspartic acid) to 



recognize C, and threonine to recognize T. Although these substitution rules are useful 
in designing zinc finger proteins they do not take into account all possible target sites. 
Furthermore, the assumption underlying the rules, namely that a particular amino acid in 
a zinc finger is responsible for binding to a particular base in a subsite is only 
5 approximate. Context-dependent interactions between proximate amino acids in a finger 
or binding of multiple amino acids to a single base or vice versa can cause variation of the 
binding specificities predicted by the existing substitution rales. 

Zinc finger proteins are often expressed with a heterologous domain as 
fusion proteins. Common domains for addition to the ZFP include, e.g., transcription 

10 factor domains (activators, repressors, co-activators, co-repressors), silencers, oncogenes 
(e.g., myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.); DNA 
repair enzymes and their associated factors and modifiers; DNA rearrangement enzymes 
and their associated factors and modifiers; chromatin associated proteins and their 
modifiers (e.g. kinases, acetylases and deacetylases); and DNA modifying enzymes (e.g., 

15 methyltransferases, topoisomerases, helicases, ligases, kinases, phosphatases, 

polymerases, endonucleases) and their associated factors and modifiers. A preferred 
domain for fusing with a ZFP when the ZFP is to be used for repressing expression of a 
target gene is a the KRAB repression domain from the human KOX-1 protein (Thiesen et 
al., New Biologist 2, 363-374 (1990); Margolin et al., Proc. Natl. Acad. Set USA 91, 

20 4509-4513 (1994); Pengue et al., Nucl Acids Res. 22:2908-2914 (1994); Witzgall et al., 
Proc. Natl. Acad. Set USA 91, 4514-4518 (1994). Preferred domains for achieving 
activation include the HSV VP16 activation domain (see, e.g., Hagmann et al., J. Virol. 
71, 5952-5962 (1997)) nuclear hormone receptors (see, e.g., Torchia et al., Curr. Opin. 
Cell. Biol. 10:373-383 (1998)); the p65 subunit of nuclear factor kappa B (Bitko & Barik, 

25 J. Virol. 72:5610-5618 (1998)and Doyle & Hunt, Neuroreport 8:2937-2942 (1997)); Liu 
et al., Cancer Gene Ther. 5:3-28 (1998)), or artificial chimeric functional domains such as 
VP64 (Seifpal et al., EMBOJ. 1 1, 4961-4968 (1992)). 

An important factor in the administration of polypeptide compounds, such 
as the ZFPs, is ensuring that the polypeptide has the ability to traverse the plasma 

30 membrane of a cell, or the membrane of an intra-cellular compartment such as the 
nucleus. Cellular membranes are composed of lipid-protein bilayers that are freely 
permeable to small, nonionic lipophilic compounds and are inherently impermeable to 
polar compounds, macromolecules, and therapeutic or diagnostic agents. However, 
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proteins and other compounds such as liposomes have been described, which have the 
ability to translocate polypeptides such as ZFPs across a cell membrane. 

For example, "membrane translocation polypeptides" have amphiphilic or 
hydrophobic amino acid subsequences that have the ability to act as membrane- 
5 translocating carriers. In one embodiment, homeodomain proteins have the ability to 

translocate across cell membranes. The shortest internalizable peptide of a homeodomain 
protein, Antennapedia, was found to be the third helix of the protein, from amino acid 
position 43 to 58 (see, e.g., Prochiantz, Current Opinion in Neurobiology 6:629-634 
(1996)). Another subsequence, the h (hydrophobic) domain of signal peptides, was found 

10 to have similar cell membrane translocation characteristics (see, e.g., Lin et al, J. Biol. 
Chem. 270:1 4255-14258 (1995)). 

Examples of peptide sequences which can be linked to a ZFP of the 
invention, for facilitating uptake of ZFP into cells, include, but are not limited to: an 1 1 
animo acid peptide of the tat protein of HIV; a 20 residue peptide sequence which 

15 corresponds to amino acids 84-103 of the pl6 protein (see Fahraeus et al, Current 
Biology 6:84 (1996)); the third helix of the 60-amino acid long homeodomain of 
Antennapedia (Derossi et al, J. Biol. Chem. 269:10444 (1994)); the h region of a signal 
peptide such as the Kaposi fibroblast growth factor (K-FGF) h region (Lin et al, supra); 
or the VP22 translocation domain from HSV (Elliot & O'Hare, Cell 88:223-233 (1997)). 

20 Other suitable chemical moieties that provide enhanced cellular uptake may also be 
chemically linked to ZFPs. 

Toxin molecules also have the ability to transport polypeptides across cell 
membranes. Often, such molecules are composed of at least two parts (called "binary 
toxins"): a translocation or binding domain or polypeptide and a separate toxin domain or 

25 polypeptide. Typically, the translocation domain or polypeptide binds to a cellular 

receptor, and then the toxin is transported into the cell. Several bacterial toxins, including 
Clostridium perfringens iota toxin, diphtheria toxin (DT), Pseudomonas exotoxin A (PE), 
pertussis toxin (PT), Bacillus anthracis toxin, and pertussis adenylate cyclase (CYA), 
have been used in attempts to deliver peptides to the cell cytosol as internal or amino- 

30 terminal fusions (Arora et al, J. Biol. Chem., 268:3334-3341 (1993); Perelle et al, Infect. 
Immun., 61:5147-5156 (1993); Stenmark et al, J. Cell Biol. 113:1025-1032 (1991); 
Donnelly et al, PNAS 90:3530-3534 (1993); Carbonetti et al, Abstr. Annu. Meet. Am. 
Soc. Microbiol. 95:295 (1995); Sebo et al, Infect. Immun. 63:3851-3857 (1995); Klimpel 
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et al, PNAS U.S.A. 89:10277-10281 (1992); and Novak et al., J. Biol. Chem. 267:17186- 
17193 1992)). 

Such subsequences can be used to translocate ZFPs across a cell 
membrane. ZFPs can be conveniently fused to or derivatized with such sequences. 
5 Typically, the translocation sequence is provided as part of a fusion protein. Optionally, a 
linker can be used to link the ZFP and the translocation sequence. Any suitable linker can 
be used, e.g., a peptide linker. 

II. Phage Display Method 
1 o The technique of phage display has provided a largely empirical means of 

generating zinc finger proteins with a desired target specificity (see e.g., Rebar, US 
5,789,538; Choo et al., WO 96/06166; Barbas et al., WO 95/19431 and WO 98/543111; 
Jamieson et al., supra). The method can be used in conjunction with, or as an alternative 
to rational design. 

15 In the present invention, phage display is used for selection of linkers. The 

method involves the generation of diverse libraries of peptides, typically linked to the 
same zinc finger protein, followed by affinity selection for phage bearing peptides with 
dimerizing activity. To use this method, the experimenter typically proceeds as follows. 
First, a gene for a zinc finger protein binding a known target segment is selected. A 

20 target sequence is then designed bearing two copies of the target segment in opposing 
orientations. The two copies can be immediately adjacent or separated by up to about 5 
nucleotides. Next, a library of nucleic acid segments encoding potential dimerizing 
peptides is provided. This library can be a completely random peptide library, or can 
represent variants of a known sequence or can include a number of known peptide 

25 sequences. A phage or phagemid expression vector is then engineered to encode a fusion 
protein comprising an outersurface phage coat protein or fragment thereof, a potential 
dimerizing peptide, and the zinc finger protein. The potential dimerizing peptide varies 
between different library members whereas the zinc finger protein is the same. The 
dimerizing peptide and zinc finger protein can be linked in either order to the phage coat 

30 protein. Typically, the phage coat protein is pill of a filamentous phage. The zinc 

finger gene and segment encoding the potential dimerizing peptide are inserted between 
segments of gene III encoding the membrane export signal peptide and the remainder of 
pill , so that the zinc finger protein is expressed as an amino-terminal fusion with pill or 
in the mature, processed protein. When using phagemid vectors, the zinc finger gene and 



potential dimerizing peptide can also be fused to a truncated version of gene III encoding, 
minimally, the C-terminal region required for assembly of pill into the phage particle. 
The resultant vector library is transformed into E. coli and used to produce filamentous 
phage which express variant peptides linked to a constant zinc finger protein on their 

5 surface as fusions with the coat protein pill. If a phagemid vector is used, then the this 
step requires superinfection with helper phage. The phage library is then incubated with 
target DNA sequence, and affinity selection methods are used to isolate phage which bind 
target with high affinity from bulk phage. Typically, the DNA target is immobilized on a 
solid support, which is then washed under conditions sufficient to remove all but the 

1 0 tightest binding phage. After washing, any phage remaining on the support are recovered 
via elution under conditions which disrupt zinc finger - DNA binding. Recovered phage 
are used to infect fresh E. coli., which is then amplified and used to produce a new batch 
of phage particles. Selection and amplification are then repeated as many times as is 
necessary to enrich the phage pool for tight binders such that these may be identified 

15 using sequencing and/or screening methods. Although the method is illustrated for pill 
fusions, analogous principles can be used to screen ZFP variants as pVIII fusions. 

Eukaryotic viruses can be used to display polypeptides in an analogous 
manner. For example, display of human heregulin fused to gp70 of Moloney murine 
leukemia virus has been reported by Han, et al, Proc. Natl. Acad. Sci. USA 92:9141-9151 

20 (1995). Spores can also be used as replicable genetic packages. In this case, polypeptides 
are displayed from the outer surface of the spore. For example, spores from B. subtilis 
have been reported to be suitable. Sequences of coat proteins of these spores are provided 
by Donovan, et al, J. Mol. Biol. 196:1-10 (1987). Eucaryotic or bacterial cells can also 
be used as replicable genetic packages. Polypeptides to be displayed are inserted into a 

25 gene encoding a cell protein that is expressed on the cells surface. Yeast and bacterial 
cells including Salmonella typhimurium, Bacillus subtilis, Pseudomonas aeruginosa, 
Vibrio cholerae, Klebsiella pneumonia, Neisseria gonorrhoeae, Neisseria meningitidis, 
Bacteroides nodosus, Moraxella bovis, and especially Escherichia coli are preferred. 
Details of outer surface proteins are discussed by Ladner, et al, US 5,571,698, and 

30 Georgiou, et al, Nature Biotechnology 15:29-34 (1 997) and references cited therein. For 
example, the lamB protein of E. coli is suitable. 
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TTT Selection of Target Gene 

Zinc finger proteins can be used to modulate the expression of any target 
polynucleotide sequence. The sequence can be for example, genomic, cDNA or RNA or 
an expressed sequence tag (EST). Typically, the target polynucleotide includes a gene or 
5 a fragment thereof. The term gene is used broadly to include, for example, exonic 
regions, intronic regions, 5'UTRs, 3' UTRs, 5' flanking sequences, 3' flanking 
sequences, promoters, enhancers, transcription start sites, ribosome binding sites, 
regulatory sites, poly-adenylation sites. Target genes can be cellular, viral or from other 
sources including purely theoretical sequences. Target gene sequences can be obtained 

10 from databases, such as GenBank, the published literature or can be obtained de novo. 
Target genes include genes from pathological viruses and microorganisms for which 
repression of expression can be used to abort infection. Examples of pathogenic viruses 
include hepatitis (A, B, or C), herpes virus (e.g., VZV, HSV-1, HSV-6, HSV-II, and 
CMV, Epstein Barr virus), HIV, ebola, adenovirus, influenza virus, fiaviviruses, 

15 echovirus, rhinovirus, coxsackie virus, corno virus, respiratory syncytial virus, mumps 
virus, rotavirus, measles virus, rubella virus, parvovirus, vaccinia virus, HTLV virus, 
dengue virus, papillomavirus, molluscum virus, poliovirus, rabies virus, JC virus and 
arbo viral encephalitis virus. Some examples of pathogenic bacteria include chlamydia, 
rickettsial bacteria, mycobacteria, staphylococci, treptocci, pneumonococci, 

20 meningococci and conococci, klebsiella, proteus, serratia, pseudomonas, legionella, 

diphtheria, salmonella, bacilli, cholera, tetanus, botulism, anthrax, plague, leptospirosis, 
and Lyme disease bacteria. 

Target genes also include genes from human or other mammals that 
contribute to disease. Some such genes are oncogenes, tumor suppressors or growth 

25 factors that contribute to cancer. Examples of oncogenes include hMSH2 (Fishel et al., 
Cell 75, 1027-1038 (1993)) and hMLHl (Papadopoulos et al., Science 263, 1625-1628 
(1994)). Some examples of growth factors include fibroblast growth factor, platelet- 
derived growth factor, GM-SCF, VEGF, EPO, Erb-B2, and hGH. Other human genes 
contribute to disease by rendering a subject susceptible to infection by a microorganism 

30 or virus. For example, certain alleles of the gene encoding the CCR5 receptor render a 
subject susceptible to infection by HIV. Other human genes, such as that encoding 
amyloid precursor protein or ApoE, contribute to other diseases, such as Alzheimer's 
disease. 
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Target genes also include genes of human or other mammals that provide 
defense mechanisms against diseases due to other sources. For example, tumor repressor 
genes, provide protection against cancer. Expression of such genes is desirable and zinc 
finger proteins are used to activate expression. 
5 Target genes also include genes that are normally turned off or expressed 

at low levels but which through activation can be used to substitute for another defective 
gene present in some individuals. For example, the fetal hemaglobin genes, which are 
normally inactive in adult humans, can be activated to substitute for the defective beta- 
globin gene in individuals with sickle cell anemia. 
10 Target genes also include plant genes for which repression or activation 

leads to an improvement in plant characteristics, such as improved crop production, 
disease or herbicide resistance. For example, repression of expression of the FAD2-1 
gene results in an advantageous increase in oleic acid and decrease in linoleic and linoleic 
acids. 

15 Once a target gene has been determined, target segments within the gene 

are selected which are to be bound by zinc finger proteins. Typically, two target 
segments are selected within the same gene to be bound by two zinc finger proteins to be 
associated by dimerizing peptides. Typically, the two segments are each of 9 or 10 bases 
and are adjacent or within about 5 nucleotides of each. Criteria for selecting target 

20 segments are described in 09/229,007 filed January 12, 1999 (incorporated by reference 
in its entirety for all purposes). 

III. Production of ZFPs and Dimerizing Peptides 

ZFP polypeptides, dimerizing peptides linked to the same, and nucleic 

25 acids encoding fusion proteins of ZFPs and dimerizing peptides can be made using 

routine techniques in the field of recombinant genetics. Basic texts disclosing the general 
methods of use in this invention include Sambrook et al., Molecular Cloning, A 
Laboratory Manual (2nd ed. 1989); Kriegler, Gene Transfer and Expression: A 
Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., 

30 eds., 1994)). In addition, nucleic acids less than about 100 bases can be custom ordered 
from any of a variety of commercial sources, such as The Midland Certified Reagent 
Company (mcrc@oligos.com), The Great American Gene Company 
(http://www.genco.com), ExpressGen Inc. (www.expressgen.com), Operon Technologies 
Inc. (Alameda, CA). Similarly, peptides can be custom ordered from any of a variety of 
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sources, such as PeptidoGenic (pkim@ccnet.com), HTI Bio-products, inc. 
(http://www.htibio.com), BMA Biomedicals Ltd (U.K.), Bio. Synthesis, Inc. 

Oligonucleotides can be chemically synthesized according to the solid 
phase phosphoramidite triester method first described by Beaucage & Caruthers, 
5 Tetrahedron Letts. 22: 1 859-1 862 (198 1), using an automated synthesizer, as described in 
Van Devanter et al., Nucleic Acids Res. 12:6159-6168 (1984). Purification of 
oligonucleotides is by either denaturing polyacrylamide gel electrophoresis or by reverse 
phase HPLC. The sequence of the cloned genes and synthetic oligonucleotides can be 
verified after cloning using, e.g., the chain termination method for sequencing double- 

10 stranded templates of Wallace et al., Gene 16:21-26 (1981). 

Two alternative methods are typically used to create the coding sequences 
required to express DNA-binding peptides. One protocol is a PCR-based assembly 
procedure that utilizes six overlapping oligonucleotides (Fig. 2). Three oligonucleotides 
(oligos 1, 3, and 5 in Figure 2) correspond to "universal" sequences that encode portions 

15 of the DNA-binding domain between the recognition helices. These oligonucleotides 
typically remain constant for all zinc finger constructs. The other three "specific" 
oligonucleotides (oligos 2, 4, and 6 in Fig. 2) are designed to encode the recognition 
helices. These oligonucleotides contain substitutions primarily at positions -1, 2, 3 and 6 
on the recognition helices making them specific for each of the different DNA-binding 

20 domains. 

The PCR synthesis is carried out in two steps. First, a double stranded 
DNA template is created by combining the six oligonucleotides (three universal, three 
specific) in a four cycle PCR reaction with a low temperature annealing step, thereby 
annealing the oligonucleotides to form a DNA "scaffold." The gaps in the scaffold are 

25 filled in by high-fidelity thermostable polymerase, the combination of Taq and Pfu 

polymerases also suffices. In the second phase of construction, the zinc finger template is 
amplified by external primers designed to incorporate restriction sites at either end for 
cloning into a shuttle vector or directly into an expression vector. 

An alternative method of cloning a DNA-binding protein relies on 

30 annealing complementary oligonucleotides encoding the specific regions of the desired 
ZFP. This particular application requires that the oligonucleotides be phosphorylated 
prior to the final ligation step. This is usually performed before setting up the annealing 
reactions. In brief, the "universal" oligonucleotides encoding the constant regions of the 
proteins (oligos 1, 2 and 3 of above) are annealed with their complementary 
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oligonucleotides. Additionally, the "specific" oligonucleotides encoding the finger 
recognition helices are annealed with their respective complementary oligonucleotides. 
These complementary oligos are designed to fill in the region which was previously filled 
in by polymerase in the above-mentioned protocol. The complementary oligos to the 
5 common oligos 1 and finger 3 are engineered to leave overhanging sequences specific for 
the restriction sites used in cloning into the vector of choice in the following step. The 
second assembly protocol differs from the initial protocol in the following aspects: the 
"scaffold" encoding the newly designed ZFP is composed entirely of synthetic DNA 
thereby eliminating the polymerase fill-in step, additionally the fragment to be cloned into 

10 the vector does not require amplification. Lastly, the design of leaving sequence- specific 
overhangs eliminates the need for restriction enzyme digests of the inserting fragment. 
Alternatively, changes to ZFP recognition helices can be created using conventional site- 
directed mutagenesis methods. 

Both assembly methods require that the resulting fragment encoding the 

15 newly designed ZFP be ligated into a vector. Ultimately, the ZFP-encoding sequence is 
cloned into an expression vector. Optionally, a nucleic acid segment encoding a 
dimerizing peptide can be cloned into the vector so as to be expressed in frame with the 
ZFP. Expression vectors that are commonly utilized include a modified pMAL-c2 
bacterial expression vector (New England BioLabs or an eukaryotic expression vector, 

20 pcDNA (Promega). The final constructs are verified by sequence analysis. 

Any suitable method of protein purification can be used to purify ZFPs of 
the invention (see, Ausubel, supra, Sambrook, supra). In addition, any suitable host can 
be used for expression, e.g., bacterial cells, insect cells, yeast cells, mammalian cells, and 
the like. 

25 Expression of a zinc finger protein, and optionally, a dimerizing peptide 

linker, fused to a maltose binding protein (MBP-ZFP) in bacterial strain JM109 allows for 
straightforward purification through an amylose column (NEB). High expression levels 
of the zinc finger chimeric protein can be obtained by induction with IPTG since the 
MBP-ZFP fusion in the pMal-c2 expression plasmid is under the control of the tac 

30 promoter (NEB). Bacteria containing the MBP-ZFP fusion plasmids are inoculated in to 
2xYT medium containing lOuM ZnC12, 0.02% glucose, plus 50 jag/ml ampicillin and 
shaken at 37°C. At mid-exponential growth IPTG is added to 0.3 mM and the cultures 
are allowed to shake. After 3 hours the bacteria are harvested by centrifugation, disrupted 
by sonication or by passage through a french pressure cell or through the use of lysozyme, 
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and insoluble material is removed by centrifugation. The MBP-ZFP proteins are captured 
on an amylose-bound resin, washed extensively with buffer containing 20 mM Tris-HCl 
(pH 7.5), 200 mM NaCl, 5 mM DTT and 50 uM ZnC12 , then eluted with maltose in 
essentially the same buffer (purification is based on a standard protocol from NEB). 
5 Purified proteins are quantitated and stored for biochemical analysis. 

The dissociation constants of the purified proteins, e.g., Kd, are typically 
characterized via electrophoretic mobility shift assays (EMSA) (Buratowski & Chodosh, 
in Current Protocols in Molecular Biology pp. 12.2.1-12.2.7 (Ausubel ed., 1996)). 
Affinity is measured by titrating purified protein against a fixed amount of labeled 

10 double- stranded oligonucleotide target. The target typically comprises the natural 

binding site sequence flanked by the 3 bp found in the natural sequence and additional, 
constant flanking sequences. The natural binding site is typically 9 bp for a three-finger 
protein and 2 x 9 bp + intervening bases for a six finger ZFP. The annealed 
oligonucleotide targets possess a 1 base 5' overhang which allows for efficient labeling of 

1 5 the target with T4 phage polynucleotide kinase. For the assay the target is added at a 
concentration of 1 nM or lower (the actual concentration is kept at least 10-fold lower 
than the than the expected dissociation constant), purified ZFPs are added at various 
concentrations, and the reaction is allowed to equilibrate for at least 45 min. In addition 
the reaction mixture also contains 10 mM Tris (pH 7.5), 100 mM KC1, 1 mM MgC12, 0.1 

20 mM ZnC12, 5 mM DTT, 10% glycerol, 0.02% BSA. (NB: in earlier assays poly d(IC) 
was also added at 10-100 u.g/u.1.) 

The equilibrated reactions are loaded onto a 10% polyacrylamide gel, 
which has been pre-run for 45 min in Tris/glycine buffer, then bound and unbound 
labeled target is resolved by electrophoresis at 150V. (alternatively, 10-20% gradient 

25 Tris-HCl gels, containing a 4% polyacrylamide stacker, can be used) The dried gels are 
visualized by autoradiography or phosphorimaging and the apparent Kd is determined by 
calculating the protein concentration that gives half-maximal binding. 

The assays can also include determining active fractions in the protein 
preparations. Active fractions are determined by stoichiometric gel shifts where proteins 

30 are titrated against a high concentration of target DNA. Titrations are done at 100, 50, 
and 25% of target (usually at micromolar levels). 
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IV. Applications of ZFPs 

ZPFs that bind to a particular target gene, and the nucleic acids encoding 
them, can be used for a variety of applications. These applications include therapeutic 
methods in which a ZFP or a nucleic acid encoding it is administered to a subject and 
5 used to modulate the expression of a target gene within the subject (see copending 
application 09/229,037 filed January 12, 1999 The modulation can be in the form of 
repression, for example, when the target gene resides in a pathological infecting 
microorganisms, or in an endogenous gene of the patient, such as an oncogene or viral 
receptor, that is contributing to a disease state. Alternatively, the modulation can be in 

10 the form of activation when activation of expression or increased expression of an 

endogenous cellular gene can ameliorate a diseased state. For such applications, ZFPs, or 
more typically, nucleic acids encoding them are formulated with a pharmaceutically 
acceptable carrier as a pharmaceutical composition. 

Pharmaceutically acceptable carriers are determined in part by the 

15 particular composition being administered, as well as by the particular method used to 
administer the composition, {see, e.g., Remington 's Pharmaceutical Sciences, 17 th ed. 
1985)). The ZFPs, alone or in combination with other suitable components, can be made 
into aerosol formulations (i.e., they can be "nebulized") to be administered via inhalation. 
Aerosol formulations can be placed into pressurized acceptable propellants, such as 

20 dichlorodifluoromethane, propane, nitrogen, and the like. Formulations suitable for 
parenteral administration, such as, for example, by intravenous, intramuscular, 
intradermal, and subcutaneous routes, include aqueous and non-aqueous, isotonic sterile 
injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that 
render the formulation isotonic with the blood of the intended recipient, and aqueous and 

25 non-aqueous sterile suspensions that can include suspending agents, solubilizers, 

thickening agents, stabilizers, and preservatives. Compositions can be administered, for 
example, by intravenous infusion, orally, topically, intraperitoneally, intravesically or 
intrathecally. The formulations of compounds can be presented in unit-dose or multi- 
dose sealed containers, such as ampules and vials. Injection solutions and suspensions 

30 can be prepared from sterile powders, granules, and tablets of the kind previously 
described. 

The dose administered to a patient should be sufficient to effect a 
beneficial therapeutic response in the patient over time. The dose is determined by the 
efficacy and of the particular ZFP employed, the target cell, and the condition of the 



patient, as well as the body weight or surface area of the patient to be treated. The size of 
the dose also is determined by the existence, nature, and extent of any adverse side-effects 
that accompany the administration of a particular compound or vector in a particular 
patient 

5 In other applications, ZFPs are used in diagnostic methods for sequence 

specific detection of target nucleic acid in a sample. For example, ZFPs can be used to 
detect variant alleles associated with a disease or phenotype in patient samples. As an 
example, ZFPs can be used to detect the presence of particular mRNA species or cDNA 
in a complex mixtures of mRNAs or cDNAs. As a further example, ZFPs can be used to 

10 quantify copy number of a gene in a sample. For example, detection of loss of one copy 
of a p53 gene in a clinical sample is an indicator of susceptibility to cancer. In a further 
example, ZFPs are used to detect the presence of pathological microorganisms in clinical 
samples. This is achieved by using one or more ZFPs specific to genes within the 
microorganism to be detected. A suitable format for performing diagnostic assays 

15 employs ZFPs linked to a domain that allows immobilization of the ZFP on an ELISA 
plate. The immobilized ZFP is contacted with a sample suspected of containing a target 
nucleic acid under conditions in which binding can occur. Typically, nucleic acids in the 
sample are labeled (e.g., in the course of PCR amplification). Alternatively, unlabelled 
probes can be detected using a second labelled probe. After washing, bound-labelled 

20 nucleic acids are detected. 

ZFPs also can be used for assays to determine the phenotype and function 
of gene expression. Current methodologies for determination of gene function rely 
primarily upon either overexpression or removing (knocking out completely) the gene of 
interest from its natural biological setting and observing the effects. The phenotypic 

25 effects observed indicate the role of the gene in the biological system. 

One advantage of ZFP -mediated regulation of a gene relative to 
conventional knockout analysis is that expression of the ZFP can be placed under small 
molecule control. By controlling expression levels of the ZFPs, one can in turn control 
the expression levels of a gene regulated by the ZFP to determine what degree of 

30 repression or stimulation of expression is required to achieve a given phenotypic or 

biochemical effect. This approach has particular value for drug development. By putting 
the ZFP under small molecule control, problems of embryonic lethality and 
developmental compensation can be avoided by switching on the ZFP repressor at a later 
stage in mouse development and observing the effects in the adult animal. Transgenic 

25 



mice having target genes regulated by a ZFP can be produced by integration of the 
nucleic acid encoding the ZFP at any site in trans to the target gene. Accordingly, 
homologous recombination is not required for integration of the nucleic acid. Further, 
because the ZFP is trans-dominant, only one chromosomal copy is needed and therefore 
5 functional knock-out animals can be produced without backcrossing. Although the 

foregoing invention has been described in detail for purposes of clarity of understanding, 
it will be obvious that certain modifications may be practiced within the scope of the 
appended claims. All publications and patent documents cited herein are hereby 
incorporated by reference in their entirety for all purposes to the same extent as if each 
10 were so individually denoted. 
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ABSTRACT Peptides that mediate dimerization of at- 
tached zinc finger DNA-binding domains have been evolved in 
vitro starting from random sequences. We first used phage 
display to select dimerization elements from libraries of 
random 15-residue polypeptides that were fused to the N 
terminus of the zinc finger domains. We then reoptimized 
these peptides by sequentially randomizing five-residue blocks 
(proceeding across the peptide in three steps) and selecting 
variant peptides that further stabilized the protein-DNA 
complex. Biochemical experiments confirmed that the se- 
lected peptides promote dimerization of the zinc fingers on an 
appropriate DNA target site. These results demonstrate that 
dimerization units can be obtained readily from random 
polypeptide libraries of moderate complexity. Our success 
reemphasizes the utility of searching random peptide libraries 
in protein design projects, and the sequences presented here 
may be useful when designing novel transcription factors. 



The affinity and specificity of DNA-binding proteins depend 
not only on interactions with the DNA but also on interactions 
with proteins that bind at neighboring sites. Such protein- 
protein interactions may involve homo- or heterodimerization 
or the assembly of multiprotein complexes. Dimerization 
strategies already have been tested, in structure-based design 
efforts, to create DNA-binding proteins with enhanced affinity 
and specificity. In the first such study (1), computer modeling 
was used to design a fusion between zinc finger subdomains 
from Zif268 and the dimerization element from Gal4. Recent 
design efforts with zinc fingers also have used leucine zipper 
dimerization motifs in an analogous manner (S. Wolfe, E. 
Ramm, and C.O.P., unpublished data). 

The selection of dimerization elements from libraries of 
random peptides represents an intriguing alternative to struc- 
ture-based design and raises many interesting questions. How 
common are functional dimerization units? Do the selected 
structures always resemble known motifs? Can we obtain new 
dimerization units that would be useful when designing tran- 
scription factors for potential applications in gene therapy? 

Phage display (reviewed in refs. 2 and 3) provides a powerful 
method for selecting functional peptides from large popula- 
tions of random polypeptides. Peptide libraries displayed on 
phage often have been screened for peptide-protein interac- 
tions in studies that focus on epitope mapping, analysis of 
substrate specificity, and the development of leads for drug 
design. Peptides that can substitute for larger protein domains 
also have been generated, either through stepwise minimiza- 
tion and reoptimization of a naturally occurring domain or by 
selection from random sequence libraries (reviewed in ref. 4). 
In one study, a peptide selected to bind the erythropoietin 
receptor (5) was found to induce dimerization of the receptor- 
peptide complex (6), demonstrating that self-associating pep- 
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tides can — at least under some circumstances — be isolated 
from random polypeptide sequences. 

In this study, we used phage display to select and optimize 
peptides that mediate dimerization of DNA-binding modules. 
Our work may have practical implications in the design of 
DNA-binding proteins and, more generally, demonstrates how 
random peptide extensions provide a basis for selecting pro- 
teins with desired functions. These results also may have 
implications regarding the role of protein-protein interactions 
in the evolution of transcription factors. 

MATERIALS AND METHODS 
Phage Display Libraries. Phagemid vectors used in the 
selections were created from pZifl2 (7) by restoring the 
reading frame between the Zifl2-coding region and gene III 
and by introducing convenient restriction sites at the start of 
Zifl2. Libraries containing randomized peptides were con- 
structed by cassette mutagenesis, using NN(G/C/T) random- 
ized codons for the initial libraries and NN(G/T) for the 
reoptimization libraries. The complete fusion protein used for 
phage display (Fig. IA) contained a PelB signal sequence; a 
short leader peptide (NH2-EPRAQNS in initial selections and 
NH 2 -EP in reoptimizations); the random peptide; residues 
4-60 of Zif268 (numbering as in ref. 8); a linker that includes 
an amber codon; and residues 23-424 of M13 gene III product. 
The ligated phagemid libraries were electroporated into XL-1 
Blue E. coli cells, yielding «=10 8 transformants for the initial 
selection libraries and =»10 9 transformants for each of the 
reoptimization libraries. 

Phage Selections. For the initial selections, phage were 
grown, harvested, and processed essentially as described pre- 
viously for zinc finger phagemid selections (7). Selections 
during the block-reoptimization steps were conducted simi- 
larly, but with the following set of changes. Binding reactions 
included 2 mM DTT to minimize the risk of selecting disulfide - 
bonded dimer interfaces. Phage-DNA complexes were cap- 
tured by streptavidin-coated paramagnetic beads (Dynal, 
Great Neck, NY) that had been equilibrated in pZifl2 wash 
buffer (7). Five microliters of a 10-mg/ml suspension of beads 
was used to capture up to 10 pmol of DNA site (with bound 
phage). The beads were washed (five times using 0.5 ml for 
each 8-min wash) and treated with high-salt elution buffer (7). 
To increase stringency, (i) binding, capture, and elution were 
performed at 37°C in the second and third reoptimization 
steps, and (ii) in the third step, the target DNA site contained 
a mutation in one of the Zifl2-binding sites (the half-site distal 
to the biotin was TGAGCG). The target DNA concentration 
also was lowered through the course of the reoptimization to 
help force competition among members of the phage pool and 
to further increase stringency. In the first block-reoptimization 
step, the target DNA concentration was reduced from 40 nM 
(cycles 1-3) to 8 nM (cycles 4-6) and then to 2 nM (cycles 7-9), 
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with the salmon sperm competitor DNA concentration low- 
ered proportionately at each stage. In the second step, con- 
centrations were 20 nM (cycle 1) or 2 nM. In the third step, the 
mutant DNA was present at 1 nM throughout. In the later 
stages of each block-reoptimization step, we estimate that the 
phage concentration was 10- to 20-fold higher than the target 
DNA. 

Protein Production and Purification. DNA fragments en- 
coding peptide-Zifl2 fusions (with Met-Ala at their N termi- 
nus) were cloned into pET-21d (Novagen) and expressed in 
BL21 (DE3) or BL21(DE3) pLysS cells. Cultures were in- 
duced, lysed, and sonicated as recommended (Novagen). 
Peptides were present in insoluble inclusion bodies and were 
purified by reversed-phase batch extraction (using Waters 
Sep-pak C18 cartridges) and reversed-phase HPLC as de- 
scribed (8). The Zifl2 peptide (amino acids 2-59 of Zif268) 
and peptide 2 were expressed as glutathione 5-transferase 
(GST) fusion proteins from pGEX-2T and pGEX-6P-3 (Phar- 
macia), respectively. These peptides were purified by affinity 
chromatography and cleaved from the GST as directed (Phar- 
macia), leaving a Gly-Ser dipeptide at the N terminus of Zifl2 
and a heptapeptide (GPLGSDP) at the N terminus of peptide 
2. The cleaved peptides were purified further by reversed- 
phase HPLC. All peptides were reconstituted from lyophilized 
HPLC fractions and refolded as described (9), and their 
concentrations were quantified by comparison with BSA stan- 
dards in SDS/PAGE using Coomassie staining. For peptides 
that gave a stable gel shift (peptides 1*, 3, and 5*), the active 
concentration of peptide was determined as described (9), and 
we found that each of these samples was fully active for DNA 
binding. 

DNA-Binding Assays. Gel mobility-shift assays (10) and 
DNase I footprinting experiments (11) were used to assess the 
DNA-binding activity of various peptides. Only peptides 1*, 3, 
and 5* produced complexes that were sufficiently stable for 
quantitative gel-shift assays, and footprinting was used to 
measure the affinity of the other peptides. 

Labeled DNA probes were generated as follows. For the 
gel-shift studies shown in Fig. 2B, oligos corresponding to the 
phage-selection target site (5 '-GGTTGCAGTGGGCGCGC- 
CCACAGTACTTGAACGTAACG-3' and 5'-CGTTACGT- 
TCA AGTACTGTGGGCGCGCCCACTGC-3 ' , Zifl2 sites in 
bold) or a single-site mutant (bold regions above replaced with 
the sequences 5 ' -TGGGCG TATGCT -3 ' and 5'- AGCAT- 
ACGCCCA-3') were annealed and end-labeled with Klenow. 
A labeled restriction fragment was used for quantitative 
studies. The oligos 5 '-GGAATTCCTGATCAAGATCTGG- 
TCACGTCCATAGGCTAGGCATGTCAAGGCTGTAT- 
G-3' and 5 '-GGGATCCACTCGCGAACGCGTCCTTGTA- 
GTGGGCGCGCCCACATACAGCCTTGACAT-3' (Zifl2 
sites in bold) were annealed, extended by mutually primed 
extension, and cloned into the EcoRl and BamHl sites of 
pBluescript II SK( + ). The probe was prepared by digesting the 
plasmid with EcoRl and Notl; labeling the DNA with Klenow, 
[a- 32 P]dCTP, and [a- 32 P]dGTP; and purifying the small frag- 
ment by native PAGE. 

Binding reactions (typically 10 /xl) contained the labeled 
DNA site (at > 100-fold below the protein concentration at 
half-maximal binding), protein (for quantitative assays, we 
used 1.3- to 2.0-fold dilution steps over a range of four orders 
of magnitude), and a buffer containing 15 mM Hepes, pH 7.8, 
60 mM potassium acetate, 60 mM potassium glutamate, 5 mM 
MgCl 2 , 20 /nM ZnS0 4 , 5% glycerol, 0.1% Nonidet P-40, 1 mM 
DTT, and 0.1 mg/ml acetylated BSA. After equilibrating the 
binding reactions at 4°C (1 .5-16 hr, depending on the peptide), 
the reactions either were resolved by native PAGE (7.5% 
37.5:1 acrylamide/bisacrylamide with 2.5% glycerol, run at 
4°C in electrophoresis buffer containing 25 mM Tris, 190 mM 
glycine, and 1 mM EDTA) or treated with DNase I. DNase I 
reactions (4 min, 4°C using 2.5 /xl of 30 jmg/ml DNase I for 



10-/Ltl reaction) were terminated, prepared, and electropho- 
resed as described (12). Data were collected by using a 
Phosphorlmager (Molecular Dynamics). 

To determine dissociation constants (K a ) for the fusion 
peptides, binding data for the selected peptides were fit by 
nonlinear regression to the equation 0 = 1/(1 + Kd/[¥] 2 ), 
where 6 is the fraction of DNA bound and [P] is the concen- 
tration of free protein, which approximately equals the total 
protein concentration in our experiments. This equation de- 
scribes the binding of two protein molecules to a DNA 
molecule with strict cooperativity. Data for Zifl2, which 
bound essentially noncooperatively, were fit to the equation 9 
= 1/(1 + K#/[P]), where K# is the dissociation constant for 
a Zifl2 monomer; the corresponding K d for the overall reac- 
tion of two Zif 12 monomer with the DNA was calculated as 
(*d0 2 - 

Sedimentation Equilibrium. Peptide samples at three con- 
centrations, ranging from *=10 ju-M to =100 fiM, were centri- 
fuged to equilibrium in a Beckman Optima XL-A at 20,000 and 
30,000 rpm at 4°C in a buffer containing 15 mM Tris, pH 7.8, 
150 mM KC1, 5 mM MgCl 2 , 20 ,j.M ZnS0 4 , and 0.2 mM DTT. 
Solvent density and partial specific volumes of peptides were 
calculated as described (13). The sedimentation data were 
analyzed by methods described in refs. 14 and 15, using the 
program NONLIN (16). 

RESULTS 

To select dimerization motifs, we attached random peptides to 
a DNA-binding domain and selected those fusion proteins that 
could bind more stably to a symmetric DNA site (Fig. 1). 
Random 15- and 30-residue peptides were expressed at the 
amino terminus of the first two zinc fingers of Zif268 (8, 17) 
(we refer to this two-finger peptide as Zifl2), and these 
peptide- Zifl2 fusions were displayed on filamentous bacterio- 
phage. Phage from the 15- and 30-mer libraries, representing 




c 

5 ' - GC AGTGGGC GCGCCCAC AGTACTTGAACGTAACG — Bio 
3 ' -CGTCACCCGCGCGGGTGTCATGAACTTGCATTGC 

Fig. 1. (A) Sketch showing key segments of the phagemid. (B) 
Expected arrangement of fusion proteins at the target DNA. Phage 
displaying two copies of a dimerizing peptide-Zifl2 fusion can form 
stable complexes with thebiotinylated target DNA site, which contains 
an inverted repeat of the Zifl2-binding site. The phage-DNA com- 
plexes are captured by streptavidin coupled to a solid support, and 
phage that bind less tightly are washed away. (C) The DNA site used 
for affinity selection of phage, with the two juxtaposed Zif 12-binding 
sites in bold. 
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10 s different sequences from each library, were pooled, and 
our affinity-selection protocol was used with a target DNA 
duplex containing an inverted repeat of the Zifl2-binding site. 
The original Zifl2 peptide, which lacks any N-terminal exten- 
sion, binds specifically, but weakly, to the "half-site" sequence 
TGGGCG, and Zifl2 phage are not retained by the target 
DNA. Therefore, our protocol enriches for phage that display 
peptides that augment the DNA-binding activity of the zinc 
fingers. 

After seven cycles of selection and amplification, the phage 
pool bound more than 100-fold more efficiently than the initial 
random libraries, indicating successful enrichment of higher- 
affinity phage. We sequenced 45 clones from this final phage 
pool and found 6 different 15-mer peptide sequences (Fig. 14). 
These peptides did not share any obvious homology, aside 
from a basic residue present at the final position of each 
sequence, blast searches (18) showed no significant similarity 
between the selected peptides and known natural proteins. 

To assess their DNA-binding activity, peptides obtained in 
these initial selections were expressed in Escherichia coli (as 
fusions with Zifl2), purified, and tested in gel mobility-shift 

J\ peptide 1: gggqwlgtwewygpk (10) 

2: YEKISVEGIKDVRVR (9) 
3: NVSIEGVLKYYRGLR (6) 
4: RSCGLDYEGYWUtLK (13) 
5: SRWLEEEVSRLLLLR (6) 
6: GEALDRFEREMKLMR (1) 



inverted repeat single site 




Fig. 2. (A) Sequences of peptide extensions isolated from the 
initial selection. Numbers in parentheses give the frequency of occur- 
rences among the 45 clones sequenced. The clones for peptide 2 
included a Glu-21-to-Asp mutation in the zinc finger region that may 
have been partially responsible for the affinity of this peptide. (B) Gel 
mobility-shift assays using purified fusion peptides 1, 3, and 5. Protein 
(2.5 nM, 250 nM, 25 nM, 2.5 nM, 250 pM, and no protein) was 
incubated with DNA containing either an inverted repeat of Zifl2 sites 
or a single Zifl2 site and then electrophoresed through native poly- 
acrylamide gels. The reduced mobility of the inverted repeat probe in 
the presence of protein indicates the formation of protein-DNA 
complexes. Similar results were obtained with fusion peptides 2 and 6, 
but data are not shown because these peptides were not studied 
further. Binding of peptide 4, also not shown, appeared to depend on 
disulfide bond formation, and this peptide was not pursued further. 



Table 1. 


Affinities for DNA duplexes containing the inverted 


repeat si 


e 5'-TGGGCGCGCCCA-3' 




Peptide 


Half-maximal binding, nM 


K d , M 2 


Zifl2 


9,600 


9.2 (±3.3) X 10" 11 


1 


410 


1.7 (±0.44) X 10" 13 


3 


37 


1.4 (±0.04) X 10" 15 


5 


440 


1.9 (±0.13) x 10" 13 


1* 


15 


2.3 (±0.35) X 10- 16 


5* 


12 


1.4 (±0.27) X 10" " 



Half-maximal binding is given in concentration o 
represents that for binding of a dimer. Data shown are mean (±SD) 
for three independent experiments. 

assays (Fig. 2B). These fusion peptides bound specifically to 
duplex DNA containing an inverted repeat of the Zifl2 sites, 
showing little or no activity with DNA containing a single site. 
The complexes with the inverted-repeat site all migrated 
similarly in the gel, with a mobility for the complexes that was 
consistent with the formation of dimers. The isolated Zifl2 
zinc linger domains did not shift either DNA site under these 
conditions. 

Quantitative DNA-binding assays were used to investigate 
further the affinity and cooperativity for binding of several of 
these fusion proteins (Table 1). When the DNA contained an 
inverted repeat of the Zifl2-binding site, fusion peptides 1, 3, 
and 5 bound substantially tighter than did Zifl2 alone. Scat- 
chard analysis demonstrated that binding of these peptides to 
the inverted repeat is second order with respect to protein, as 
expected for a species that exists as a monomer in solution, but 
that binds the DNA site as a dimer. (This analysis also showed 
that Zifl2 binds the inverted repeat with slight cooperativity, 
but the data for Zifl2 were more consistent with a first -order 
than a second-order reaction.) 



original peptide 

< < < < J3Q> 



block 2 . 
reselection- * 



optimized peptide 

Fig. 3. Overall scheme for sequential reoptimization of peptides. 
A 15-residue peptide obtained from the initial selection was divided 
conceptually into three blocks of 5 aa each and reoptimized in three 
steps. In the first step, the five-residue block closest to the fingers was 
completely randomized, with the other 10 aa held constant. Phage 
display of the new fusion proteins, with six to nine selection and 
amplification cycles, was used to obtain the best sequences from this 
pool. In the second reoptimization step, the central five-residue block 
was completely randomized, with the finger-proximal region held 
constant as the newly optimized sequence and the finger-distal region 
corresponding to the initially selected sequence. The best sequences 
from this pool were obtained again via phage display with a series of 
selection and amplification cycles. In the final reoptimization step, the 
finger-distal five-residue block was completely randomized and then 
reselected in the context of the two other reoptimized blocks. 



Biochemistry: Wang and Pabo 



Proc. Natl. Acad. Sci. USA 96 (1999) 9571 



Because our initial search could test only a tiny fraction of 
all possible 15-mer sequences, we expected that we could use 
our initial peptides as a starting point for the "evolution" of 
even more efficient dimerization motifs. We developed a 
sequential reoptimization strategy (Fig. 3) to try to improve 
the dimerization properties of fusion peptides obtained in our 
initial selections. Our strategy conceptually divided each 15- 
mer peptide into three five-residue blocks. In the first reop- 
timization step, we completely randomized the block closest to 
the fingers (with the other residues held constant) and selected 
for sequences with even higher affinity for the symmetric DNA 
site. The second and third blocks were randomized and 
reselected in subsequent reoptimization steps. During this 
procedure, we took several measures to increase the stringency 
of the selection conditions. The concentration of DNA target 
was lowered 40-fold over the course of the reoptimization 
steps, and the temperature of the binding reaction was raised 
from 23 to 37°C. In the third reoptimization step, a mutation 
was introduced into one of the Zifl2 sites to weaken the 
protein-DNA interface (and thus create greater selective 
pressure for effective dimerization). In the later cycles of each 
reoptimization step, phage were present in excess of the DNA 
target, forcing direct competition among the remaining phage 
for the limited number of binding sites and, thus, favoring 
selection of the tightest-binding sequences from each pool. 

Progress of the sequential reoptimization protocol was 
monitored by sequencing phage pools at a number of stages. 
The full reoptimization strategy was applied to fusion peptides 
1 and 5 from the initial selections (Fig. 4). Peptide 3, which also 

peptide 1 gggqw lgtwe wygpk - zifl2 



was reoptimized, yielded variants that appeared to form high- 
er-order oligomers and therefore was not studied in detail 
(data not shown). We typically used six to nine selection- 
amplification cycles for the reoptimization of any particular 
five-residue block. One of the final sequences — resembling the 
consensus for the set — was then chosen for use in the next step. 
In some cases, choosing a consensus sequence was complicated 
somewhat by the presence of spurious mutations (Fig. 4) in 
nonrandomized portions of the peptide. Because such muta- 
tions probably conferred some other selective advantage (in- 
dependent of the peptide block targeted for reoptimization), 
clones carrying them were not used in assigning a consensus. 
The final selected peptides — with all three five-residue blocks 
reoptimized — were designated as 1* and 5* to indicate that 
they had been obtained by sequentially reoptimizing peptides 
1 and 5. A variant form of peptide 1* also was chosen for 
further analysis (Fig. 4). 

The reoptimized fusion peptides were expressed and puri- 
fied, and their DNA-binding properties were assessed with 
quantitative gel-shift assays (Table 1). For fusion peptides 1* 
and 5*, half-maximal DNA binding was observed with peptide 
concentrations in the nanomolar range, demonstrating that the 
reoptimization process produced peptides with significantly 
higher affinity. The variant form of peptide 1* also bound very 
tightly, but appeared to form higher-order complexes with the 
DNA, and this peptide was not studied further. 

Sedimentation equilibrium experiments were conducted 
with several fusion peptides to determine their oligomeric 
state in solution. Peptide 1*, peptide 5, and the isolated Zifl2 



GGGQW 
Stepl GGGQW 
(Cycle6) GQGQW 

GGGQW 
GGGQW 
GGGQW 
GGGQW 
GGGQW 

(CycTeS) ^GQW 



GGGQW LLNYK VPKQR 
Step 2 GGGQW [LLNYVI VPKQR 
(Cycle 9) GGGQW LLDYI VPKQR 
GGGQW LLNYI VPKQR 
GGGQW LLQYV VPKQR 
GGGQW LLNYV VPKQR 
GGGQW LLEYK VPKQR 
GGGQW LLDYV VPKQR 



LGTWE HPKMK 

LGTWE PAKIR 

LGTWE VPKSR 

LGTWE VPRLK 

LGTWE APKIiR 

LGTWE HAKIR 

LGTWE WKMR 

LGTWE PVXMR 

LGTWE |VPKQR[ 



Step 3 



I HPMNN I LLNYV VPKMR 



(Cycle 6) HPMNN LLNYV VPKMR 
PPSTE LLNYV VRKLR 
QKYGD LLNYV VRKLR 
| EUYEK | LLMYV VRKLR 
EKYEK LLNYV VRKLR 

peptide 1* hpmnn llnyv vpkmr - 

peptide 1* enyek llnyv vrklr - 
(variant) 



Zifl2 
Zifl2 



peptides srwle eevsr llllr - zifl2 



Step 1 



FRWLE EEVSR MRLWR 
FRWLE EEVSR MRGWK 
FRWLE EEVSR MRGWK 



(Cycle 9) SRWLE EEVSR pMRKWR - 

SRWLE EEVSR MRKWR 

SRWLE EEVSR MRKWR 

SRWLE EEVSR MRKWK 

SRWLE EEVSR MGVMR E21D 

SRWLE | EYLES | MRKWR 
SRWLE DYVTQ MRKWR 
tclrfe R\ SRWLE DYLAD MRKWR 
1 y ; SRWLE EYLTF MRKWR 
SRWLE QYLED MRKWR 
SRWLE DYVSQ MRKWR 
SRWLE SYLDK MRKWR 
SRWLE EYMSD MRKWR 

QPWLT EYLES MRKWR 
„ PPWLI EYLES MRKWR 
(Cvcle 61 PPWLK EYLES MRKWR 
^ y ' | PAWLT [ EYLES MRKWR 
PAWLA EYLES MRKWR 
WAWLD EYLES MRKWR 
PPWLK EYLES MRKWR 
PTWLT EYLES MRKWR 

peptide 5* pawlt eyles mrkwr - zifl2 



Fig. 4. Evolution of peptides 1 and 5 by sequential block reoptimization. The sequences selected from each reoptimization step are shown in 
bold, with the number of selection and amplification cycles given in parentheses. Sequences roughly matching the consensus that were used in later 
steps have been boxed. In some cases, such as in reoptimization step 3 for peptide 1 and reoptimization step 1 for peptide 5, we isolated clones 
that carried spurious mutations at a nondegenerate position of the peptide extension. In addition, the E21D mutation in the zinc finger region (which 
also was seen in the original peptide 2 sequence) arose several times; this mutation may stabilize complex formation by improving contacts at the 
protein-DNA interface. [Note: some confusion was caused by this E21D mutation, which occurred in the first reoptimization step for peptide 1, 
but was discovered only after reoptimization step 2. Thus, the "consensus" sequence from reoptimization step 1 (VPKQR), chosen after 
selection-amplification cycle 9, had a glutamine that did not occur in sequences isolated after cycle 6. To double-check this position of peptide 
1, it was randomized again during reoptimization step 3. The corresponding position was allowed to vary as Q, M, I, or L, along with the complete 
randomization of the third block. The reselections showed that methionine or leucine is preferred at this position.] 
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domain were monomeric at all concentrations tested (up to 
^100 ijM). Peptide 5* was monomeric at concentrations up to 
50 juM, above which the peptide appeared to form higher- 
order aggregates (apparently tetramers). These results con- 
firmed that our selected peptides exist as monomers in solution 
at the concentrations used in the DNA-binding assays. 

DISCUSSION 

Protein-protein interactions can play important roles in pro- 
tein-DNA recognition by facilitating cooperative binding. In 
this project, we sought to "evolve" stable zinc finger dimer- 
ization elements starting from libraries of random polypep- 
tides. Broadly speaking, our goals in this study were 3-fold: (i) 
to gain some impression about the frequency of functional 
dimerization elements in a pool of random polypeptides; (ii) to 
explore the utility of a sequential block-reoptimization strategy 
to improve the activity of selected peptides; and (Hi) to 
generate dimerization elements for zinc finger proteins that 
may be useful in future efforts to create designer DNA-binding 
proteins for applications in gene therapy. 

Given the length of the random peptides tested in this study, 
phage display allows one to search only a tiny fraction of the 
relevant sequence space. We screened about 10 s sequences 
from each library, but there are 10 19 possible 15-mers and 10 39 
possible 30-mers. The success of the initial screen, which 
yielded several different peptides that mediate dimerization, 
suggests that such peptides are relatively "common" in se- 
quence space. Zhang et al. (19) have isolated dimerization 
elements by fusing random fragments of the yeast genome to 
the DNA-binding domain of lambda repressor and selecting 
fusion proteins that reconstitute repressor activity. This group 
reached similar conclusions regarding the frequency of func- 
tional dimerization domains. Our findings may help explain 
why dimerization elements are so common and have such 
diverse sequences in natural DNA-binding proteins. The pep- 
tides that we have isolated may be analogous — in an evolu- 
tionary and functional sense — to the peptide extensions that 
are responsible for heterodimerization of certain homeodo- 
main proteins (20-22). 

It is interesting that we obtained only 15-mer extensions in 
our initial selection, although the starting library consisted of 
equal numbers of fusion proteins with 15- and 30-residue 
N-terminal extensions. At this stage, the significance of this 
observation remains unclear. Our sample may be too small to 
determine the relative effectiveness of 15-mers and 30-mers as 
dimerization units, and it is possible that problems with 
processing and display on the phage surface become more 
severe with large random peptides. Considering the number of 
selection-amplification cycles used, even a modest difference 
in propagation efficiency between 15-mers and 30-mers could 
have resulted in a substantial bias in the final pool. 

During the natural evolution of a protein, many sequence 
variants are tested for improved activity. We adopted a gen- 
erally similar strategy, searching sequences related to the 
initial peptides, but we generated variants in a distinctive way. 
The peptides were reoptimized in three steps, with each step 
involving an exhaustive search of a five-residue sequence 
block. Envisioning that the zinc fingers would provide a 
relatively rigid structural framework, we began reoptimization 
with the five-residue block closest to the fingers and then 
proceeded outward. Because we completely randomize each 
block when it is reoptimized, our procedure systematically 
searches a large number of sequence variants that can differ 
dramatically from the initial peptide, and the final sequence 
may be altered (potentially) at every position. In this respect, 
our strategy encompasses a broader search than more tradi- 
tional mutagenesis schemes, which often involve creating 
variants of the initial sequence with a limited number of 
changes (see ref. 2 for a theoretical discussion of different 



mutagenesis strategies). Given the number of residues that are 
randomized in a reoptimization step, it is even possible that the 
overall fold of a reoptimized peptide will be different from the 
fold of the original peptide. 

Our "sequential block reoptimization" strategy was applied 
successfully to several fusion peptides and yielded variants with 
high DNA-binding affinity (Table 1). Assuming that the 
binding of the isolated Zif 12 domain reflects the binding of the 
Zifl2 moiety in the selected fusion peptides, the binding 
energy contributed by each peptide extension is represented by 
the free energy of binding for the fusion peptide minus that for 
Zif 12 alone. The contribution of the peptide extension includes 
the energy of dimerization as well as any energy derived from 
contacts between the peptide extension and the DNA. For 
peptides 1* and 5*, this value is about 7.3 kcal/mol (i.e., 20.0 
kcal/mol - 12.7 kcal/mol). This is more than twice that 
contributed by peptides 1 and 5 (=*3.5 kcal/mol), which had 
been obtained in the initial selections. The DNA-binding 
affinities of our optimized peptides (1* and 5*) are roughly 
comparable to those of ZFGD1, a rationally designed chimeric 
protein composed of Zifl2 fused to a 60-residue linker and 
coiled-coil dimerization domain from Gal4 (1). It appears that, 
in some situations, such small peptides may be able to func- 
tionally replace larger protein domains. 

In the course of our reoptimizations, we may have ap- 
proached a practical limit of our selection system. Although 
our selections employed a "monovalent" display format (23), 
we assume that most of the phage that bound in a given cycle 
actually were bivalent. (Presumably, monovalent phage pre- 
dominate in the sample, but the fraction of retained phage was 
always less than 1%, and it is certainly possible that these are 
predominantly bivalent phage.) Phage displaying two copies of 
the peptide-zinc finger fusion would bind more tightly to the 
dimeric site than their monovalent counterparts, because the 
peptides attached to the same phage would be tethered at a 
high "effective concentration." The opportunity for bivalent 
binding presumably aided the initial selections but may com- 
plicate reoptimization. As higher-affinity dimers arise, pep- 
tides on bivalent phage may, aided by the high "effective 
concentration," form dimers even in the absence of DNA, and 
this would eliminate any basis for selecting tighter dimers. In 
addition, as dimer interfaces become more stable, and as 
members of the phage pool become more similar (in the 
reoptimizations, all sequences in a pool share at least 10 
residues), there also is an increased probability of two slightly 
different monovalent phage binding to a single DNA target 
molecule. Such phage heterodimers could seriously complicate 
the selection process. (Lowering the concentration of phage 
might minimize this possibility.) Finally, selection for high- 
affinity dimers may inadvertently isolate peptides with alter- 
native oligomerization states. Several of our reoptimized se- 
quences appeared to form higher-order complexes, and it is 
interesting in this context that design studies with self- 
associating amphipathic helices have shown that subtle se- 
quence changes can dramatically alter the oligomeric state 
(24). Similar adventitious effects may have occurred with some 
of our selected peptides, or there may be binding modes that 
somehow permit two bivalent phage to occupy the same DNA 
site. 

Investigating the structural details of the complexes pre- 
sented here should yield basic insights into how dimerization 
can be achieved with short peptides. The different peptide 
"classes" uncovered in this study share no obvious sequence 
similarity with each other or with natural dimerization ele- 
ments, suggesting that we have isolated several distinct motifs 
that may represent novel dimerization units. Furthermore, 
secondary structure predictions (25) for our sequences indi- 
cate very different structural propensities for the different 
peptides. The selected peptides might fold into different 
structures or pack in different ways at the dimer interface. 
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Although we tend to envision dimerization in terms of pep- 
tide-peptide interactions, cooperative binding also would be 
obtained if the peptide extension from one fusion protein 
reached across the center of the DNA site and bound to the 
zinc finger domain of a symmetry-related molecule. In prin- 
ciple, a peptide also might induce dimer formation by pro- 
moting domain swapping (26) between substructures of two 
Zifl2 monomers when they are bound to adjacent sites on the 
DNA. Finally, we note that improved DNA-binding affinity 
could result from additional peptide-DNA contacts (the Lys 
or Arg residues that are preferred at the position immediately 
preceding the fingers may play some such role), but these 
contacts would not be expected to contribute to the observed 
cooperativity of binding. 

The selection of dimerization elements for zinc fingers 
demonstrates that these elements are relatively common in 
sequence space and reemphasizes the utility of screening 
random polypeptide libraries when developing proteins with 
desired activities. We have shown that a sequential reoptimi- 
zation strategy can generate peptides with significantly higher 
activity, and peptide sequences such as those described here 
may prove useful for other zinc finger and DNA-binding 
protein design studies. There may be practical advantages to 
using these selected peptides (as opposed to known dimeriza- 
tion motifs such as coiled coils), because it seems less likely that 
these peptides will "crossreact" by heterodimerizing with 
natural dimerization interfaces presented by proteins in the 
cell. Further characterization of these novel motifs should 
broaden our understanding of macromolecular recognition 
and protein evolution by providing interesting comparisons to 
natural polypeptide sequences involved in dimerization and 
cooperative binding. 
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Abstract: Protein-protein interactions are crucial for the asse nbly and function of many 
protein-DNA complexes. To explore the spectrum of structui al possibilities for such 
interactions, we previously selected short peptide extensions hat facilitate cooperative 
binding of zinc finger domains, and we have now determined the crystal structure of one 
such complex. We find that this peptide extension mediates «limerizaiion by reaching 
across the twofold axis and contacting an exposed surface of the finger that is bound I 
the neighboring site. The overall features of this complex an remarkably similar to those 
seen with some homeodomain heierodimers. Addition of su< h contacts may provide 
readily accessible route (both in vivo and in vitro) for enhanc.ng the affinity and 
i specificity of recognition. 

^ Protein-protein interactions can play critical roles in t le formation of higher-oider 

protein-DNA complexes and in the combinatorial control of ?ene expression (for 
examples, see /, 2). Such cooperative interactions can increj se the affinity and/or 
specificity of protein-DNA recognition, change the concentrsiion dependence of binding, 
or recruit other regulatory proteins to the DNA site. One interesting example of this k ind 
Of contact involves cooperative binding of homeodomain hetwodimcrs. Cry stallograp hie 
studies of the yeast MATa Va2 complex revealed that a carb ^xyl-icrminal peptide tail 
from MATa2 binds to an exposed hydrophobic patch on MATal, thereby enhancing the 
affinity and specificity of recognition (3). Structural studies hove also shown that the 
Drosophila tjltrabi thorax and human HoxB I homeodomain proteins contact their 
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partners (Extradenticle and Pbxl, respectively) through a cor served hexapeptide that 
packs against a hydrophobic patch on the neighboring homec domain (4, 5). 

It is not yet known how common such peptide-proteii contacts are in higher-order 
protein-DNA complexes. However, we have found very sira lar interactions in the 
crystal structure of a peptide-zinc finger fusion— previously i >btained by in vitro ^ 
evolution— -that binds cooperatively to DNA (Fig. 1). This fusion protein had been ^ 
selected and optimized in sever*! different stages (5). We ha i begun these experimen is 
by adding random i3-residuc peptioe extensions to the N-terntnus or fingers one and wo 
of Zif268 (7) and had used phage display to select fusion proteins that formed stable 
bomodimers on a palindromic DNA site. Using these selectc d peptides as a starting ^ 
point, we then proceeded with several rounds of randomizati >n and reselection (under' 
more stringent conditions) to obtain peptide* that further stat ilized homodimer ^ 
formation. The rcopUmizcd fusion protein that was chosen far further structural study 
[designated 1* in our previous paper «5)] is monomelic in so.ution and yet binds DN/J 
primarily as a dimer Half-maximal DNA binding occurs at 15 nM protein, while the , 
corresponding Zlfl-Zif2 construct (without the peptide cxter sion) binds at 1 0 uM. ' 

The peptide cxtenaion obtained in this experiment ha i been selected to facilitate 
cooperative binding of the zinc finger domains, but there wa i no further constraint on i he 
type of contacts (peptide-peptide, peptide-protein, or peptidc-DNA) that might media* 
this binding. To determine what interactions were made by i he peptide, we crystalli 
and solved the structure of this complex. Wc obtained excel cnt crystals of the 
homodimer bound to a symmetric DNA site, prepaid heavy atom derivatives by 
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synthesizing iodine-substituted DNA oligos, and then solved the structure and refined if 
to 2.35 A resolution (Table 1). As expected, our complex ha > two protein monomers 
arranged symmetrically on the palindromic DNA site that ha I been used in the select! 3ns 
(Fig. 1 A). The overall structures and docking arrangements <>f the two zinc fingers on 
each half site are similar to those observed in the Zif268-DN \ complex (7, 8). Phosphate 
contacts made by two residues (Lys 101 and Arg 103) from each peptide extension 
augment the zinc finger-DNA interactions (9). However, the rest of the peptide extension 
stretches away rrom the attached zinc finger domains, reachi ig past the center of the ^ 
binding site and making extensive interactions with zinc finder I from the symmetry- 
related molecule (described in more detail below). The zinc finger surface contacted by 
the peptide involves a region where the a helix packs agains ; the First strand of the (3 
sheet, and this exposed surface lies just above the secondary strand of the DNA (Fig. ItA). 
This overall arrangement shows a striking similarity to the ii teractions found in the ^ 
homcodomain heterodimers that have been studied crystalloi;raphically (Fig, 1). j 
As observed with the homeodomain heterodimers, hj drophobic interactions j 
dominate the peptide-protein interface in our complex (Fig. !1B). In describing these 
interactions, we use a numbering scheme that follows the co lvention used in the wild- 
type Zif268 structure (residue numbers 104 to 160 in the cryrtal structure correspond to 
residues 4 to 60 in the zinc finger sequence, ref. 7), and thus our 15-residue peptide 
extension is numbered as H89-PMNNLLNYWPKM-R103 (to indicate that this is th< 
preceding region of the polypeptide chain in our new proteir ). Near the twofold axis i t 
the center of the site, the side ehain of Pro 104 packs against Pro 104* atidTyr 105' (from 
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the other subunit), while Met 102 interacts with Pro 104. Pro 104', and Tyr 105'. In tfjis 
region, there alto is a hydrogen bond between the carbonyl o.;ygen of Ser 1 17 and the 
hydroxyl group of Tyr 105'. Further outward along the pepti le, the side chains of Valj 
and Pro 100 form a nonpolar surface that supports the side chain of Tyr 97 (which also^ 
interacts with Leu 94). Leu 94 r Leu 95, Tyr 97, and Val 99 of the peptide extension 
contact a number of residues in zinc finger 1 of the other monomer and thus form a key 

part of the diruer interface. ^cy> 94 fits into a hydrophobic pecket formed by zinc finger 

residues Pro 108', Val 109*, He 126', His 129', and Thr 130' Leu 9? contacts nonpolar 
groups on the side chains of Thr 1 30' and Gin 1 32' ; Tyr 9 J v inches Pro 1 08 • and He 
126*; and Val 99 interacts with Tyr 105\ Ser 1 19', Leu 122', Thr 123'. and lie 126' 
Along the edges of this extensive, hydrophobic interface, the r are several bridging water 
molecules, but the water-mediated interactions that we observe are not conserved tunoug 
the crystallographically independent copies of the dimer intc face. 

Our structure has a number of interesting implication;; for understanding Jrfnc 
finger -DNA interactions and for understanding the origin of joopcrativity among DN. 
binding proteins. One of the most important observations is hat relatively weak 
interactions between a peptide extension and the surface of n neighboring 
protein' — contacts which are not sufficient to give stable dim u-s in solution — can still 
dramatically stabilize the corresponding protein-DNA complexes. Both the peptide and 
the surface it recognizes are present at high "local concentraiions" when these proteins 
hind to adjacent sites on the DNA. Because these peptide extensions bind in the content 
of existing protcin-DNA complexes, they do not need to havs prefoldad structures witli a 
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precise fit or 10 provide much energy. (The binding energy of the protein-DNA 
interaction can help overcome the entropy loesee that otherwise would be involved with 
dimer formation.) We imagine that such structures represent a readily accessible 
"evolutionary path" for generating cooperativiiy in the forma :ion of higher-order profc in- 
DNA complexes, and our selection studies have shown that sach peptides are rather ^ 
common in pools of random peptide sequences (<$). It appeal s that a variety of such 
peptide-protcin contacts use similar structural principles. Fo example, there are 
substantial differences between the MATaI/a2 (J) and the Ultrabithorax/Extradenticle 
(4) complexes (with respect to the structure of the extension ind the arrangement of 



homeodomains on the DNA), but the underlying principles in both cases seem quite 
similar (JO). 

A number of reports have described biochemical experiments indicating protcri- 
protein interactions mediated via Cys 4 -His 4 zinc finger doma ns (//). and it seems quite 
plausible that the hydrophobic surface used in our complex i. lay be involved in some of 
these other interactions. In this regard, there are interesting j-arallels between the contact 
surface exploited by our selected peptide and protein-protein interaction surfaces 
observed with the GLI and SWI3 zinc finger proteins. The s.nicturc of the previously 
reported GLI zinc fmger-DNA complex (12) showed that Gl-I finger 1 makes no protem- 
DNA contacts, but instead interacts extensively with GLI finger 2. Protein-protein 
contacts involving zinc fingers were also observed for the SWTS DNA-binding domain, 
which includes an N-terminal extension that folds into an ad iitlonaJ 3 strand and an 01 
hchx that packs against SWI5 finger 1 (13). Strikingly, die inteiactions in each case (the 
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surfaces used by GLI finger 1. by GLI finger 2, and by SWI5 finger 1) all involve 
hydrophobic contacts to sites that are nearly identical to the region of Zif268 finger 1 , 
contacted in our current structure (Fig. 3). It thus appears thai this surface of the zinc 
finger is particularly well suited for associating with other pre tein structures, and we 
anticipate that it may play a similar role in many other complexes, [The correspondence 
of these structures is especially interesting since— in other caies — it has been shown that 
peptides selected by phage display targeted natural protein-pr >tein interaction surfaces, 
{14, 75).] I 
Dimerization modules of the type reported here may te useful when designing j 
new zinc finger proteins that recognize extended binding site: , and such modules may 
provide effective alternatives to covalent linkage (16, 77) or t J the use of coiled-coil 
dimerization domains (J 8, 79). Our results indicate that this ]«ptidc extension could t*. 
used with Zif variants that recognize alternative sites (i.e., the structure suggests that the 
peptide-protein contacts responsible for cooperative binding i hould function relatively 
independently of the zinc finger-DNA contacts responsible fc r site-specific recognition). 
Many further improvements through design and/or selection ulso seem feasible. It should 
be possible to ( 1) further optimize these peptide-protein interactions, to (2) obtain | 
variants that stabilize head-to-tail binding of zinc finger proteins, and to (3) isolate , 
peptides that specifically contact other proteins bound at neighboring sites, j 
Preassociation of protein domains to adjacent regions on DN \ can provide a starting 
point for subsequent selection or evolution of modules that allow cooperative binding or 
enhanced specificity. The similarity between our selected zitic finger hornodimer and tlie 
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homcodomain heterodimers highlights the important roles th it peptide extensions can 
play in the formation of thc«« complexes and illustrates how Jesign, selection, and 
evolution can exploit common underlying physical principles . 
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22. Because of the different lengths of the fingers, the su ^exposition aligned C„ at< >ms 
of residues 104-1 14 and 1 16-127 of Zif finger 1 with chose of residues 3-13 ar d 
15-26 of GU finger 1 ; residues 105-1 14 and 1 16-13C of Zif finger 1 with 37-4 6 
and 51-65 of GU finger 2; and residues 104-130 Zif finger 1 with 31-57 of S\^5 
finger 1 . 

23. The fusion peptide contains an NH 5 -Met~Glu Pro leader peptide, the 15 residue* 
of the selected peptide extension (tf), and residues 4 lo 60 of Zif268 (7). The 
peptide was overexpressed, purified, and prepared fo r crystallization essentially as 
described previously for Zif268 variant peptides (20) . 

24 The seif-complementary oligonucleotide 5'-ATGGCCGCGCCCAT-3' was 

purified as described previously (21) end annealed « a high concentration (3 ii*M 
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in duplex) to favor formation of intcrmolecular duple) es over intramolecular 
hairpins. Derivative oligos contained either 5-iodourscU at position 2 or 5- 

iodocytosine at position 8. 
25. Equal volumes of protein (1.1 mM in dimer) and dupbx DNA (1.5 mM) were 
mixed, and complexes were solubilized with the addition of NaCl to 0.4 M. 
Crystals were grown in an anaerobic chamber using h inging drop vapor diffuse 
from drops containing ihe complex and an equal volu ne of the well solution (13- 
20% PJiO-4,000 / 50.130 mM Nad flO mM MgC1 a / 30 mM MES, pH 0.2). The 
crystals were soaked for approximately 5 minutes in s solution containing equd 
volumes of well solution and cryoprotcctant solution 38% glycerol / 20% PECi- 
4.000 / 100 mM NaCl / 50 mM MES, pH 6.2) and Hi sh-coolcd in a stream of 
cold nitrogen (J 26 K). 

Z. Otwinowski, W. Minor, Methods Enzym. 276, 307 (1997). 
T. O. Yeates, Methods Enzym. 276, 344 (1997). 

Collaborative Computational Project Number 4. Acta Crystallogr. D50, 760 
(1994). 

In the crystal, DNA duplexea stack crid-to-end, althoi gh the basapairs at the 
junctions are rotatlonally offset so that pseudocontinv ous helices are not forme 3. 
DNA stacks run along each crystallography 3, screw axis. Thus, there are three 
stacks of three complexes in each unit cell, and complexes within each stack are 
crysrallographicaJly related. The three crystallograpi ically independent 
complexes (and the rwo crystallo^raphioally indepen-lent halve of each cempllx) 
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arc very similar to one another. However, one of the ? rotein monomers showr 
poor density for several residues at the N-terminus (Asn 93 to Val 98) and for 
finger 2, ©specially in the beta hairpin region. (There are few crystal contacts n 
these regions, and they therefore may be more mobile .) With these exceptions, 
residues Asn 93 to Thr 158 of each protein monomer and the entire DNA dupl sx 
for each complex are visible Jn our structure. 
T. Terwilliger, J. Berendzen. Acta Crystattogr. DSS, 149 (1999). 

solve located d of 6 sites m derivative ldU-2 and 3 of 6 sites m derivative it C- 
8. Since there were three copies of each duplex in th. : asymmetric unit, and 
because each duplex was expected to have the same c istribution of iodine sites, 
superimposing corresponding sets of heavy atom pos lions allowed us to predict 
the remaining heavy Mt om site and to determine approximate noncryrtallograpl «c 
symmetry (NCS) operators. 

T. A. Jones, J.-Y. Zou, S. W. Cowan, M. Kjeldgaard, Acta Crystallogr. A47, 1 10 
(1991). 

A. T. Brdnger. X-PLOR Version 3.1: A System for X-^ay Crystallography and 
NMR (Yale University Press, New Haven, CT. 1992). 
34. Supported by the Howard Hughes Medical Institute. We thank the staff at NSI.S 
Beamline X4A for the use of their facility; E, Peisact for assistance with data 
collection; and E. Peisach. S. Fay-Richard, S. Wolfe, and T. Yeates for helpful 
discussions. B.S.W. was a Howard Hu t hes Medical Institute predoctoral fellow. 
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Coordinates for the crystal structure have been deposited in the Protein Data Biink 
(accession code 1F2I). 
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Figure 1. Crystal structure of our zinc finger homodimer and < omparisonB with the 
MAT«l/a2 homeodonuun heterodimer. (A) Overview of our 1 omodimer complex, 
protein monomers (colored in red and yellow) bind in a hcad-t<-head orientation on the 
DNA; (he peptide extension and zinc fingers for the monomer n yellow arc labelled. The 
complex is approximately symmetric, with a twofold axis that joes through the center oF 
the DNA and is perpendicular to the plane of the page. There iire two zinc fingers in each 
monomer, and these bind essentially as observed in the wiid-t> pe Zif268 complex (fing< 
2 for each monomer is hard to see in this figure since it is aimcst directly behind ringer 
in this view of the complex). (B) Crystal structure of the MAI al/a2 heterodimer-DNA 
complex as determined by Wolberger and colleagues (3). A p>ptide extension from the 
MATo2 homeodomain (red) contact* an exposed hydrophobic surface on the MATal 
homeodomain (yellow) to facilitate cooperative binding. 
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Figure 2. J^ptidc-protein contacts at the dimcr interface of our zinc finger complex. F>r 
simplicity, only contacts in one half of the symmetric dimer an: shown: an equivalent su 
of contacts is seen in the other half of the dimer. (A) View of i he dimer interface 
showing the peptide extension (yellow ribbon and stick figure) fitting against the zinc 
finger (surface representation). Most of the zinc finger is colored red, but residues that 
interact with the peptide extension arc colored in gray to show the extensive contact 
surface that is involved. The DNA in this part of the complex is shown in blue. (B) 
Diagram highlighting key residues of the peptide extension (y< How) and zinc finger J 
(red) at the dimer interface. 
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Figure 3. Comparison of the contact surfaces of various line fingers. Zif268 finger 1, 
OU ringer 1. OLl finger Z t and SW15 finger J are shown in rec\ green, cyan, and yelloi 
respectively. (A) Stereoview of a superposition of the four fiuf ers (22). Side chains on 
each finger that are involved in hydrophobic contacts at the cor -esponding protein-protein 
interface have been rendered in stick representation. (Coordinf tes for GU and SW15 
from refs. 12, 13.) (B) Alignment of the sequences of the fingers, with interacting 
residues bOXed. The portion of Zif268 shown here correspond to residues 104 to 130 ir 
the crystal structure. 
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Table 1. Crystallographic Analysis. The zinc finger fusion septide (23) was 
cocrystallized with a 14-base pair DNA duplex (24) using PliQ 4000 as the precipitat t 
(25). The crystals form in spacegroup P3, with cell dimensions a » b - 86.2 A, c = 1113.0 
A. Diffraction data (for Native- 1 and the derivatives) were f irst collected on cryocoo] cd 
crystals using a rotating anode X-ray generator and an R-Ax s IV image plate system. 
Data were processed with the HKL suite (26). The crystals exhibited partial merohedral 
twinning, and the twin fraction in each crystal (as listed in the Table) was estimated 
according to the procedure of Ycatcs (27). Data wa» dctwin iod using the DETWIN ^ 
program in the CCP4 suite (28). There are three dimer DNA complexes per asymmetric 
unit (29). Iodine sites in the derivatives were located with S DLVE (30). and MIR phi ses 
(31) were improved by solvent flattening and noncrystallogr iphic symmetry (NCS) 
averaging using DM (28). The resulting experimental el*etr;m density map was readi ly 
interpretable, and a model was built with O (32). We refine. I the model to 2.7 A withX- 
PLOR (33) using the Native- i data set. As refinement processed, we relaxed NCS 
constraints to restraints and then eliminated NCS restraints altogether, refined grouped B- 
factors, and applied a bulk solvent correction. We then colle cted an additional data set 
(N«tive-2) at the National Synchrotron Light Source on B.a ulinc X4A (X= 1.0093 A), 
and this data was dctwinned and merged with die detwinned Native- 1 data. Using thh 
higher-resolution data set. we proceeded with further positional refinement and 
individual, restrained B-factor refinement and also added 3 1 9 water molecules to the ' 
model. In the final model. 90.9% of the residues lie in the c ire regions of the [ 
Ramachandran plot and the remaining residues occupy addi ional allowed region* 
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Data Collection and MIR Phasii g~ 





Native- 1 


IdU-2 


IdC-8 


Resolution, A 


20-2.7 


20-3.0 


20 - 2.7 


Measured reflections 


85.644 


87,027 


97,771 


Unique reflections 


30,331 


22,186 


29,250 


Completeness, % 


100 (99.9) 


99.9 (99.5) 


99.9 (100) 




7.0 (38.4) 


14.0 (56.0) 


13.2(74.7) 


reflections with Va{l)>2, % 


77.9 (40,8) 


82.2 (46.9) 


64.4 (30.3) 


Twin fraction 


0.05 


0.2 


0.08 






17.0 


25.9 


Phasing power 




1.02 


0.83 



Figure of merit, MIR 0.33 

Values in parentheses for highest resolution shell. 



Refinement 



Resolution range, A 
Reflections. F>2a(F) 
# of Non-H atoms 



20 - 2.35 
38.060 
5329 
21.0 
25.6 



Native. -2 
20-2.:>5 
191,628 
45,80: » 
99.9(99.8) 
8.1 (34.5) 
89.J (65,4) 
0.26 
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average B-valuc (A 2 ) 41.4 

rmsd AB-values, bondtd atoms (A 2 ) 4.1 

rmsd from ideal, bond lengths, A 0.007 

rmsd from idea], bond angles, 0 14 
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WHAT IS CLAIMED IS: 

1 . A nonnaturally occurring dimerizing peptide. 

2. The peptide of claim 1 that this is a homo-dimerizing peptide. 

3. The peptide of claim 1 that lacks significant sequence identity with 
a naturally occurring peptide. 

4. The peptide of claim 1 having a length of 30 amino acids or 

shorter. 

5. A zinc finger complex, comprising a first fusion protein comprising 
a first zinc finger protein and a first peptide linker and a second fusion protein comprising 
a second zinc finger protein and a second peptide linker, wherein the first and second 
fusion proteins are complexed by specific binding of the first and second peptide linkers, 
and wherein the first and second peptide linkers are nonnaturally occurring peptides. 

6. The zinc finger complex of claim 5, wherein the first and second 
peptide linkers are first and second copies of the same linker. 

7. A method of selecting a dimerizing peptide, comprising: 

(a) providing a phage display library in which a member displays a zinc 
finger protein fused to a peptide from its outersurface, the zinc finger protein being the 
same in different members, and the peptide varying between different members; 

(b) contacting the library with a nucleic acid substrate comprising first and 
second binding sites for the zinc finger protein, whereby phage displaying a zinc finger 
protein fused to a dimerizing peptide preferentially bind to the substrate relative to phage 
displaying a zinc fusion protein fused to a nondimerizing peptide, and 

(c) isolating the phage that bind to the substrate: 

(d) sequencing a segment of the genome of a phage binding to the 
substrate to determine the identity of a dimerizing peptide. 

8. The method of claim 7, further comprising repeating steps (a)-(c) with 
the phage display library in (a) in one cycle comprising phage from step (c) in a previous 
cycle. 

9. The method of claim 7, further comprising 

repeating steps (a)-(c) with the phage display library in step (a) in a 
subsequent cycle comprising phage encoding peptides that are variants of a peptide 
encoded by a phage in step (c) from the previous cycle. 

1 0. The method of claim 7, wherein the first and second binding sites 
are in opposing orientations in the substrate. 



50 



1 1 . The method of claim 7, wherein the phage displaying a zinc finger 
protein fused to the a dimerizing peptide bind to the substrate via display of two copies of 
the zinc finger protein and the dimerizing peptide, whereby the two copies of the zinc 
finger protein respectively bind to the first and second binding sites. 
5 12. The method of claim 7, wherein the peptide is a random peptide. 

1 3 . The method of claim 7, wherein the peptide is 30 amino acids or 
fewer in length. 

14. The method of claim 7, wherein the peptide is 1 5 amino acids or 
fewer in length. 

10 1 5 . A method of regulating or detecting a target sequence, comprising: 

contacting the target sequence with a zinc finger complex, comprising a 
first fusion protein comprising a first zinc finger protein that specifically binds a segment 
of the target sequence and a first peptide linker and a second fusion protein comprising a 
second zinc finger protein that specifically binds a second segment of the target sequence 

15 and a second peptide linker, whereby the first fusion protein binds to the first segment of 
the target sequence, and the second fusion protein binds to the second segment of the 
target sequence, and the first and second fusion proteins bind to each other via the first 
and second peptides. 

1 6 . The method of claim 1 5 , wherein the target sequence is present in 

20 an intact cell. 

17. The method of claim 1 5 , further comprising contacting the cell 
with an expression vector encoding the first fusion protein and/or the second fusion 
protein, wherein the expression vector enters the cell and is expressed to produce the first 
and/or second fusion protein. 

25 18. The method of claim 1 5 , wherein the target sequence is present in a 

patient. 

19. The method of claim 15, wherein the target sequences is present in 

a cell extract. 
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ABSTRACT 

The invention provides nonnaturally occurring dimerizing peptides, and 
methods for their production. Such peptides are useful to mediate association of linked 
functional proteins domains. In particular, such peptides are useful for mediating 
association of complexes of multiple zinc finger proteins thereby affording greater 
specificity and/or affinity in binding of the zinc finger proteins to proximately spaced 
target segments. 
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